Co-funded by the European Union Semantic CMS Community Semantic Data Access Copyright IKS Consortium 1 Lecturer Organization Date of presentation
May 11, 2015
Co-funded by the European Union
Semantic CMS Community
Semantic Data Access
Copyright IKS Consortium1
LecturerOrganization
Date of presentation
www.iks-project.eu
Page:
Copyright IKS Consortium
Introduction of Content Management
Foundations of Semantic Web Technologies
Storing and Accessing Semantic Data
Knowledge Interaction and Presentation
Knowledge Representation and Reasoning
Semantic Lifting
Designing Interactive Ubiquitous IS
Requirements Engineering for Semantic CMS
Designing Semantic CMS
Semantifying your CMS
Part I: Foundations
Part II: Semantic Content Management
Part III: Methodologies
(2) (1)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
www.iks-project.eu
Page:
What is this Lecture about?
We have learned ... ... which languages can be used
to model knowledge. ... how to extract knowledge
from content in a automatic way (semantic lifting).
We need a way ... ... to store the extracted
knowledge technically in an accessible way.
Copyright IKS Consortium
3
Storing and Accessing Semantic Data
Knowledge Interaction and Presentation
Knowledge Representation and Reasoning
Semantic Lifting
Part II: Semantic Content Management
(3)
(4)
(5)
(6)
www.iks-project.eu
Page:
Outline
Semantic Data Semantic Web RDF
Semantic Data Storage Triple Stores
Semantic Data Access SPARQL RQL API Calls
Copyright IKS Consortium
4
www.iks-project.eu
Page:
Semantic Data
Stands for machine understandable information Allows computers to figure out the data without user
interference Allows computers act intelligently without programming
for each task
5
Copyright IKS Consortium
www.iks-project.eu
Page:
Semantic Data
Provides infrastructure to get practical results Applications find out subsequent information based on the
previous relations. (e.g. Eiffel Tower -> Paris -> France) Allows reasoning capabilities
Providing extraction of related information which is not directly linked
6
Copyright IKS Consortium
www.iks-project.eu
Page:
Semantic Web
A classical generic description: “Web of data”
Extends the World Wide Web By encouraging,
Common language for representing data Transformable to/from disparate sources such as relational
databases, XML, etc (RDF) Common reusable data model to represent data from different
domains in common terms (RDFS, OWL, etc) Rules to enable applications reason over the information
(SWRL)
7
Copyright IKS Consortium
www.iks-project.eu
Page:
Semantic Web Layer Cake
Semantic Web Layer Cake, Image source: http://www.w3.org/2007/03/layerCake.svg
8
Copyright IKS Consortium
www.iks-project.eu
Page:
Semantic Web So many organizations publishing their data in different
domains Media Geographic Government …
Whole set contains approximately 30 billion triples One of the largest collections is DBPEDIA
Semantified version of Wikipedia Example:
Obtain cities of China that have population over 20 million Needs efficient storage and query for semantic data
Copyright IKS Consortium
9
www.iks-project.eu
Page:
Representation of Semantic Data
RDF The common data format An abstract model with several serialization formats Consists of statement referred as triples having the form
(subject, predicate, object) where, Subject: any resource identifier Predicate: a resource identifier of any property Object: either a resource identifier or a literal value
10
Copyright IKS Consortium
www.iks-project.eu
Page:
Storing Semantic Data
Need for specialized designs for triple collections Two modalities:
Relational databases Triple stores
Mostly used for storage Lots of implementations
They can also be RDB based.
11
Copyright IKS Consortium
www.iks-project.eu
Page:
Triple Store A purpose-built database for the storage and retrieval of
RDF data. Optimized place to add, remove and query for triples.
Each triple in the TripleStore complies with the form (subject, predicate, object)
12
Copyright IKS Consortium
www.iks-project.eu
Page:
Considering XML Databases XML databases are existing storage systems for semi-
structured data Idea: Transform RDF to XML and store it in XML databases Yet, XML data model is not exactly same with semantic data
XML data model is a tree-like structure RDF data is represented through a graph without an hierarchy
Copyright IKS Consortium
13
www.iks-project.eu
Page:
Considering XML Databases XML Databases are not suitable for storage and querying
RDF Only simple manipulations can be handled through XML query
languages RDF Schema processing and inference is not possible Standard RDF/XML mapping is unsuitable
Copyright IKS Consortium
14
www.iks-project.eu
Page:
Monolithic approach for DB Based Triple Stores
Generic representation for all RDF schemas Only two tables are used
Resources table Triples table
Copyright IKS Consortium
15
www.iks-project.eu
Page:
Monolithic approach for DB Based Triple Stores
Copyright IKS Consortium
16
predid subid objid objvalue
6 2 1
5 3 7
5 1 8
5 9 2
3 9 Sunscale
id uri
1 http://www.iks.og/topics.rdfs#Hotel
2 http://www.iks.og/topics.rdfs#HotelDirections
3 http://www.oclc.org/dublincore.rdfs#title
4 http://www.iks.og/schema.rdf#Ext.Resource
5 http://www.w3.org/1999/02/22-rdf-syntax-ns#type
6 http://www.w3.org/2000/01/rdf-schema#subClassOf
7 http://www.w3.org/1999/02/22-rdf-syntax-ns#Property
8 http://www.w3.org/2000/01/rdf-schema#Class
9 rl
www.iks-project.eu
Page:
Triples Stores
Can be categorized into 3 category: In memory triple stores
Used for certain operations like benchmarking, caching, etc Native triple stores
Provides their own implementations (Virtuoso, Mulgara, AllegroGraph, …)
Non memory non native triple stores Are built on third party databases (Jena SDB, Kaon, …)
17
Copyright IKS Consortium
www.iks-project.eu
Page:
Functionalities provided by Triple Stores RDBMS-support General RDF model access Query language support in the store such as RQL,
SPARQL Some stores provide:
Provenance - tracking of who-said-what APIs for accessing triple store over network
Very few stores provide: Full text search Inference and rule languages
Copyright IKS Consortium
18
www.iks-project.eu
Page:
Example Triple Store implementations
RDF Suite Sofia Alexaki, Vassilis Christophides, Gregory Karvounarakis, Dimitris
Plexousakis, Karsten Tolle. The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases , SemWeb, 2001
Based on an ORDBMS model Sesame
http://www.openrdf.org/ Relational databases (mysql, postgres, oracle)
Jena http://www.hpl.hp.com/semweb/jena2.htm Relational databases (mysql , postgres, oracle)
Virtuoso http://virtuoso.openlinksw.com/ Native RDF Quad Storage (Physical Quads)
Copyright IKS Consortium
19
www.iks-project.eu
Page:
RDFSuite (ICS-Forth)*
* IST-1999-13479 C-Web, IST-2000-26074 Mesmuses
20
Copyright IKS Consortium
www.iks-project.eu
Page:
How triples are stored and accessed in RDF Suite
Separate tables are created to store resources Properties, subClasses, subProperties and instances
Indices on attributes like URI, source and target Querying is possible through RQL
Copyright IKS Consortium
21
www.iks-project.eu
Page:
How triples are stored and accessed in RDF Suite
Copyright IKS Consortium
22
[Figure from *]
*Sofia Alexaki, Vassilis Christophides, Gregory Karvounarakis, Dimitris Plexousakis, Karsten Tolle. The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases , SemWeb, 2001
www.iks-project.eu
Page:
Sesame Architecture
DBMS-independent API for accessing triple repositories SAIL API
A set of Java interfaces between other modules and repository
Abstract from the actual storage mechanism
Query Module RQL support
Different ways to communicate with clients Through Protocol handlers
Copyright IKS Consortium
23
*Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, Proceedings of the First International Semantic Web Conference, 2002
www.iks-project.eu
Page:
SAIL API over PostgreSQL PostgreSQL
Object-relational DBMS Support sub-table
relations between its tables for providing RDF Schema class and property subsumption
Individuals are represented under separate tables created for resources
Difficult to add table
*Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, Proceedings of the First International Semantic Web Conference, 2002
24
Copyright IKS Consortium
www.iks-project.eu
Page:
SAIL API over MySQL MySQL
The database schema does not change when the RDFS changes
Has advantage where RDFS is unstable
*Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, Proceedings of the First International Semantic Web Conference, 2002
25
Copyright IKS Consortium
www.iks-project.eu
Page:
Jena2 Architecture
Copyright IKS Consortium
26
www.iks-project.eu
Page:
Jena2 Architecture
Copyright IKS Consortium
27
*Kevin Wilkinson, Craig Sayers, Harumi A. Kuno, Dave Reynolds: Efficient RDF Storage and Retrieval in Jena2, Proceedings of SWDB'03, The first International Workshop on Semantic Web and Databases
www.iks-project.eu
Page:
Jena2
Jena2 Denormalized schema
Avoids unnecessary joins by merging URIs, literals in statements table
Multiple statement tables Better locality and caching
Property Tables
Copyright IKS Consortium
28
www.iks-project.eu
Page:
Normalized vs Denormalized Tables
Copyright IKS Consortium
29
www.iks-project.eu
Page:
Property Tables
Copyright IKS Consortium
30
Subject Property Object
person1 name Alice
person1 age 32
person1 twinOf person2
person1 faxPhone x1234
person1 adminPh x5678
person2 name Bob
person2 age 35
person2 adopteeOf person6
person2 friendOf person8
person2 gender male
Subject Property Object
person1 twinOf person2
person1 faxPhone x1234
person1 adminPh x5678
person2 adopteeOf person6
person2 friendOf person8
ID name age gender
p1 Alice 32 -
p2 Bob 35 male
Triple Store
Person Property TableTriple Store Only
*Kevin Wilkinson, Craig Sayers, Harumi A. Kuno, Dave Reynolds: Efficient RDF Storage and Retrieval in Jena2, Proceedings of SWDB'03, The first International Workshop on Semantic Web and Databases
www.iks-project.eu
Page:
Jena Persistence Options
SDB Scalable storage and query for RDF Specifically designed for SPARQL support Supports: MySQL, PostgreSQL, Oracle 11g, Microsoft
SQL server and IBM DB2 Scales to graphs of 100 million triples
Copyright IKS Consortium
31
www.iks-project.eu
Page:
Jena Persistence Options
TDB Provides for large scale storage and query of RDF
datasets using a pure Java engine Supports SPARQL A non-transactional, faster database solution for use by a
single system It scales well beyond SDB and is simpler to setup
Copyright IKS Consortium
32
www.iks-project.eu
Page:
Virtuoso
General purpose RDBMS with extensive RDF adaptations
RDF data is stored as RDF quads, i.e. it supports RDF with named graphs i.e. graph, subject, predicate, object tuples The columns are G for graph, P for predicate, S for subject
and O for object
Copyright IKS Consortium
33
www.iks-project.eu
Page:
Querying Semantic Data
Semantic data can be queried from triple stores by Various query languages
SPARQL Different endpoints provided
RQL RDQL SeRQL …
API Calls Through proprietary APIs of different projects
Linked Data
34
Copyright IKS Consortium
www.iks-project.eu
Page:
SPARQL
Is an RDF query language Standardized by W3C consortium Similar concept of SQL for databases
Syntactically resembles to SQL RDF Graphs instead of databases
35
Copyright IKS Consortium
www.iks-project.eu
Page:
SPARQL Endpoints
Provides functionality to query the knowledge base via the SPARQL language
Accepts queries and returns results through HTTP protocol
Query results can be in different formats such as RDF XML HTML JSON CSV
36
Copyright IKS Consortium
www.iks-project.eu
Page:
Semantic Data Access With API Calls
Open source projects provides APIs to manipulate RDF data Jena Apache Clerezza Sesame JRDF
37
Copyright IKS Consortium
www.iks-project.eu
Page:
Jena
Jena provides a rich API to manipulate the RDF stored in the underlying triple store. Model to represent graphs CRUD methods for triples Querying methods for existing resources
See the next slide for the code snippet…
38
Copyright IKS Consortium
www.iks-project.eu
Page:
Jena Code SnippetString personURI = "http://somewhere/JohnSmith"; String givenName = "John"; String familyName = "Smith"; String fullName = givenName + " " + familyName;
// create an empty Model which represents an RDF graphModel model = ModelFactory.createDefaultModel();
// create the resource which will produce the triples in the next slideResource johnSmith
= model.createResource(personURI).addProperty(VCARD.FN, fullName) .addProperty(VCARD.N,
model.createResource() .addProperty(VCARD.Given, givenName)
.addProperty(VCARD.Family, familyName));
39
Copyright IKS Consortium
www.iks-project.eu
Page:
Jena
Created triples with the code snippet in previous slide:
(<http://somewhere/JohnSmith>, VCARD.FN, “John Smith”)(<http://somewhere/JohnSmith>, VCARD.FN, _)(_, VCARD.Given, “John”)(_, VCARD.Family, “Smith”)
• Note that _ symbol represents a blank node
40
Copyright IKS Consortium
www.iks-project.eu
Page:
Apache Clerezza
Provides an API regardless from the different triples stores it supports
Its API provides a model to represent RDF graphs and manipulate those graphs
Also provides an SPARQL endpoint to query the stored knowledge
41
Copyright IKS Consortium
www.iks-project.eu
Page:
Apache Clerezza Code Snippet
String base = “http://www.example.org#”;MGraph g = new SimpleMGraph();g.add( new TripleImpl(
new UriRef(base + “JohnSmith”),new UriRef(rdf:Type)new UriRef(foaf:Person)));
g.add( new TripleImpl(new UriRef(base + “JohnSmith”),new UriRef(VCARD:FN)LiteralFactory.getInstance().createTypedLiteral(“John”)));
Simple code snippet adding two triples to the graph:
42
Copyright IKS Consortium
www.iks-project.eu
Page:
Linked Data
Interrelated datasets on the Web so that computers can explore them
Has a standard format to be accessed and managed Provides integration and reasoning on a huge amount
of data on the Web
43
Copyright IKS Consortium
www.iks-project.eu
Page:
Linked Data
Four famous principles of linked data represented by Tim Berners-Lee Use URIs as names of things Use HTTP URIs to provide dereferencable data to people When an URI is dereferenced provide useful information in
standard format (RDF, SPARQL) Provide links to other URIs to make possible discovery of
related data
44
Copyright IKS Consortium
www.iks-project.eu
Page:
Linked Data
45
Copyright IKS Consortium
www.iks-project.eu
Page:
Linking Open Data Project
Is an W3C SWEO Project Aims to make data freely to everyone Aims to publish open data sets as RDF and set
semantic relationships between them Serves information in a machine readable format Enriches content Reduces duplication
Linked datasets increasing rapidly A large number of datasets are linked already
46
Copyright IKS Consortium
www.iks-project.eu
Page:
Linked Datasets As of October 2008
47
Copyright IKS Consortium
www.iks-project.eu
Page:
Linked Datasets As of September 2010
48
Copyright IKS Consortium
www.iks-project.eu
Page:
2011
49
Copyright IKS Consortium
www.iks-project.eu
Page:
Access Data In The Cloud
Follow the RDF links representing the “things” SPARQL Endpoints Ready to use software to discover linked data (See the
next slide)
50
Copyright IKS Consortium
www.iks-project.eu
Page:
Linked Data Applications
Lots of application on top of the linked data Tabulator Marbles Openlink RDF Browser …
Just google RDF Crawlers RDF Browsers
Also see the following link containing a number of linked data applications: http://www.w3.org/wiki/SweoIG/TaskForces/CommunityPr
ojects/LinkingOpenData/Applications
51
Copyright IKS Consortium
www.iks-project.eu
Page:
Available SPARQL Endpoints
http://dbpedia.org/sparql http://www4.wiwiss.fu-berlin.de/dblp/ To see possible SPARQL endpoints providing a certain
URI see http://void.rkbexplorer.com/endpoint-search/
See also a list of alive SPARQL endpoints http://www.w3.org/wiki/SparqlEndpoints
52
Copyright IKS Consortium
www.iks-project.eu
Page:
References http://www.w3.org/TR/rdf-sparql-query http://jena.sourceforge.net/tutorial/RDF_API/index.html http://www.slideshare.net/ldodds/sparql-tutorial http://www.slideshare.net/shamod/a-hands-on-overview-of-the-semantic-web?src=related_normal
&rel=1702851 http://www.cambridgesemantics.com/2008/09/sparql-by-example http://linkeddata-specs.info/ http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData http://www.bioontology.org/wiki/images/6/6a/Triple_Stores.pdf Sofia Alexaki, Vassilis Christophides, Gregory Karvounarakis, Dimitris Plexousakis, Karsten Tolle.
The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases , SemWeb, 2001 Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture
for Storing and Querying RDF and RDF Schema, Proceedings of the First International, Semantic Web Conference, 2002
Kevin Wilkinson, Craig Sayers, Harumi A. Kuno, Dave Reynolds: Efficient RDF Storage and Retrieval in Jena2, Proceedings of SWDB'03, The first International Workshop on Semantic Web and Databases
http://jena.sourceforge.net/DB/index.html http://virtuoso.openlinksw.com/
53
Copyright IKS Consortium