Top Banner
from the Semantic Web to the Web of Data ten years of linking up Davide Palmisano - Fondazione Bruno Kessler Lugano 30-03-2010
64

From the Semantic Web to the Web of Data: ten years of linking up

Sep 08, 2014

Download

Technology

Presentation given at Lugano Java User Group. 30 March 2010
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: From the Semantic Web to the Web of Data: ten years of linking up

from the Semantic Web to the Web of Dataten years of linking up

Davide Palmisano - Fondazione Bruno KesslerLugano 30-03-2010

Page 2: From the Semantic Web to the Web of Data: ten years of linking up

a short ToC

story of a buzzword

concepts and ideas behind it

Linked Data: four rules, billions of opportunities

successes, failures and hopes

the server side of the triple: Java and the Semantic Web

Page 3: From the Semantic Web to the Web of Data: ten years of linking up

story of a buzzword

“To a computer, the Web is a flat, boring world devoid

of meaning.”

“A new form of Web content that is meaningful to computers will

unleash a revolution of new possibilities”

“The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, ”

Page 4: From the Semantic Web to the Web of Data: ten years of linking up

story of a buzzword

Page 5: From the Semantic Web to the Web of Data: ten years of linking up

story of a buzzword

Page 6: From the Semantic Web to the Web of Data: ten years of linking up

story of a buzzword

Page 7: From the Semantic Web to the Web of Data: ten years of linking up

story of a buzzword

“Adding semantics to the web involves two things: allowing documents which have information in machine-readable forms, and allowing links to be created with relationship values.”

Page 8: From the Semantic Web to the Web of Data: ten years of linking up

story of a buzzword

The Web as a global giant decentralized database

machine-readable content metadata

typed objects and relationships

with shared semantics

Page 9: From the Semantic Web to the Web of Data: ten years of linking up

concepts and ideas behind it

Page 10: From the Semantic Web to the Web of Data: ten years of linking up

How to represent the knowledge ?

concepts and ideas behind it

Page 11: From the Semantic Web to the Web of Data: ten years of linking up

concepts and ideas behind it

How to represent the knowledge ?

World’s academic communities dealt for years with knowledge representation

artificial intelligence, natural language processing, model management and many other research fields largely contributed

some ancestors traced the way

Page 12: From the Semantic Web to the Web of Data: ten years of linking up

concepts and ideas behind it

SHOE[1]

“SHOE is an extension to HTML which allows authors to annotate their web pages with machine-readable knowledge”

<USE-ONTOLOGY ID="cs-dept-ontology" VERSION="1.0" PREFIX="cs" URL= "http://www.cs.umd.edu/projects/plus/SHOE/cs.html">

<CATEGORY NAME="cs.Professor" FOR="http://www.cs.umd.edu/users/hendler/">

<RELATION NAME="cs.member">     <ARG POS=1 VALUE="http://www.cs.umd.edu/projects/plus/">     <ARG POS=2 VALUE="http://www.cs.umd.edu/users/hendler/"> </RELATION>

<RELATION NAME="cs.name">    <ARG POS=2 VALUE="Dr. James Hendler"> </RELATION>

Page 13: From the Semantic Web to the Web of Data: ten years of linking up

concepts and ideas behind it

John Sowa’s Conceptual Graphs [2]

(...) they express meaning in a form that is logically precise, humanly readable, and computationally tractable (...)

AGNTBOY WALK

“boy walking”

Page 14: From the Semantic Web to the Web of Data: ten years of linking up

has been the goal of a standardization effort mainly lead by the W3C

concepts and ideas behind it

declining such approaches in a

unpredictable

decentralized

potentially incoherent

environment as the Web is

Page 15: From the Semantic Web to the Web of Data: ten years of linking up

concepts and ideas behind it

Resource Description Framework RDF

corner stone of the Semantic Web technology stack

1999, first publication

directed and labeled graphs as data model

Page 16: From the Semantic Web to the Web of Data: ten years of linking up

concepts and ideas behind it

everything is univocally identifiable with a Uniform Resource Identifier

a web page, a person, a book, an intangible thing

http://dpalmisano.myopenid.com

http://dbpedia.org/resource/Lugano

http://dbtune.org/myspace/coldplay

Page 17: From the Semantic Web to the Web of Data: ten years of linking up

concepts and ideas behind it

relationships between things could be expressed with a directed, labeled graph

where

nodes could be resources or XMLSchema-typed values

and relationships are identified also by URIs

Page 18: From the Semantic Web to the Web of Data: ten years of linking up

concepts and ideas behind it

http://dpalmisano.myopenid.com

http://sws.geonames.org/3165243/

Page 19: From the Semantic Web to the Web of Data: ten years of linking up

concepts and ideas behind it

http://dpalmisano.myopenid.com

http://sws.geonames.org/3165243/

http://xmlns.com/foaf/0.1/based_near

it’s an RDF triple

Page 20: From the Semantic Web to the Web of Data: ten years of linking up

concepts and ideas behind it

http://dpalmisano.myopenid.com

http://sws.geonames.org/3165243/

http://xmlns.com/foaf/0.1/based_near

http://www.geonames.org/ontology#name

Trento

Page 21: From the Semantic Web to the Web of Data: ten years of linking up

concepts and ideas behind it

http://dpalmisano.myopenid.com

http://sws.geonames.org/3165243/

http://xmlns.com/foaf/0.1/based_near

http://www.geonames.org/ontology#name

Trento

http://www.geonames.org/ontology#population

104946

Page 22: From the Semantic Web to the Web of Data: ten years of linking up

concepts and ideas behind it

http://dpalmisano.myopenid.com http://sws.geonames.org/3165243/

http://xmlns.com/foaf/0.1/based_near

XML serialization

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf="http://xmlns.com/foaf/0.1/">

<rdf:Description rdf:about="http://dpalmisano.myopenid.com/"><foaf:based_near rdf:resource="http://sws.geonames.org/3165243/"/>

</rdf:Description>

</rdf:RDF>

Page 23: From the Semantic Web to the Web of Data: ten years of linking up

concepts and ideas behind it

http://dpalmisano.myopenid.com http://sws.geonames.org/3165243/

http://xmlns.com/foaf/0.1/based_near

Turtle serialization

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<http://dpalmisano.myopenid.com/> foaf:based_near <http://sws.geonames.org/3165243/> .

Page 24: From the Semantic Web to the Web of Data: ten years of linking up

concepts and ideas behind it

http://dpalmisano.myopenid.com http://sws.geonames.org/3165243/

http://xmlns.com/foaf/0.1/based_near

N3 serialization

<http://dpalmisano.myopenid.com/> <http://xmlns.com/foaf/0.1/based_near> <http://sws.geonames.org/3165243/> .

Page 25: From the Semantic Web to the Web of Data: ten years of linking up

concepts and ideas behind it

http://dpalmisano.myopenid.com http://sws.geonames.org/3165243/

http://xmlns.com/foaf/0.1/based_near

JSON serialization

{ "http://dpalmisano.myopenid.com" : { "http://xmlns.com/foaf/0.1/based_near": [ { "type" : "uri" , "value" : "http://sws.geonames.org/3165243/" } ] }}

Page 26: From the Semantic Web to the Web of Data: ten years of linking up

concepts and ideas behind it

this triple represents a relationshipbetween two resources

http://dpalmisano.myopenid.com http://sws.geonames.org/3165243/

http://xmlns.com/foaf/0.1/based_near

but how we can represent the meaning of that relationship?

defining vocabularies and ontologies: RDFSchema and OWL

Page 27: From the Semantic Web to the Web of Data: ten years of linking up

concepts and ideas behind it

http://helloworld.com/ontology/Person

http://helloworld.com/ontology/father

an “Hello World” RDFSchema vocabulary

rdfs:Class rdfs:Property

rdf:type

rdf:type

rdf:typerdf:type

Page 28: From the Semantic Web to the Web of Data: ten years of linking up

concepts and ideas behind it

http://helloworld.com/ontology/Person

http://helloworld.com/resource/Michele

RDFSchema entailment: inferring new statements

http://helloworld.com/resource/Davide

http://helloworld.com/ontology/father

rdf:type

Page 29: From the Semantic Web to the Web of Data: ten years of linking up

concepts and ideas behind it

http://helloworld.com/ontology/Person

http://helloworld.com/resource/Michele

RDFSchema entailment: inferring new statements

http://helloworld.com/resource/Davide

http://helloworld.com/ontology/father

rdf:type

rdf:type

Page 30: From the Semantic Web to the Web of Data: ten years of linking up

concepts and ideas behind it

OWL allows to specify other axioms

property cardinality restrictions

classes disjunction

property transitivity

cardinality constraints

but beware: more expressivity means more reasoning complexity

interested in these topics? give a try to [3]

Page 31: From the Semantic Web to the Web of Data: ten years of linking up

concepts and ideas behind it

describe everything...

and more...

Page 32: From the Semantic Web to the Web of Data: ten years of linking up

concepts and ideas behind it

RDFa: Bridging the traditional Web with the Semantic Web

<div rel="dc:creator"> <span typeof="foaf:Person" about="http://foafbuilder.qdos.com/people/dpalmisano.myopenid.com/foaf.rdf#me"> <a property="foaf:name" rel="foaf:homepage" href="http://dpalmisano.myopenid.com/">Davide Palmisano</a> <a rel="foaf:workplaceHomepage"

href="http://www.fbk.eu">Fondazione Bruno Kessler</a> </span></div>

Page 33: From the Semantic Web to the Web of Data: ten years of linking up

concepts and ideas behind it

SPARQL: querying the Semantic Web

based on graph pattern matching

SPARQL Protocol and RDF Query Language

4 different operators: SELECT, DESCRIBE, ASK and CONSTRUCT

Page 34: From the Semantic Web to the Web of Data: ten years of linking up

concepts and ideas behind it

SPARQL: querying the Semantic Web

SELECT ?personWHERE {

?person a foaf:Person.?person ex:age ?age. FILTER(?age > 18)}

Page 35: From the Semantic Web to the Web of Data: ten years of linking up

concepts and ideas behind it

SPARQL: querying the Semantic Web

“In which university have studied the founders of successful IT companies?”

and order them by frequency...

Page 36: From the Semantic Web to the Web of Data: ten years of linking up

concepts and ideas behind it

SELECT DISTINCT ?almaMater, count(?almaMater) as ?frequency WHERE {{ {?company a dbpedia-owl:Company} UNION { ?company a yago:InternetCompaniesOfTheUnitedStates } UNION  {?company a yago:CompaniesBasedInSiliconValley} UNION {?company a yago:CompaniesListedOnNASDAQ} }?company dbpedia-owl:numberOfEmployees ?numberOfEmpl.FILTER (?numberOfEmpl > 0).OPTIONAL { ?company dbpedia-owl:keyPerson ?keyPerson }?keyPerson dbpprop:almaMater ?almaMater.}ORDER BY DESC(?frequency)

Page 37: From the Semantic Web to the Web of Data: ten years of linking up

Linked Data: four rules, billions of opportunities

1. Use URIs to identify things.

2. Use HTTP URIs so that these things can be referred to and looked up ("dereference") by people and user agents.

3. Provide useful information (i.e., a structured description - metadata) about the thing when its URI is dereferenced.

4. Include links to other, related URIs in the exposed data to improve discovery of other related information on the Web.

Page 38: From the Semantic Web to the Web of Data: ten years of linking up

Linked Data: four rules, billions of opportunities

DBpedia: Wikipedia as a database

extract such structured info and represent it with RDF

Page 39: From the Semantic Web to the Web of Data: ten years of linking up

Linked Data: four rules, billions of opportunities

let’s do it also for

and for all imaginable data-intensive traditional Web sites...

Internet Movie Database

GeoNames

BBC /programmes

CIA factbook

CiteSeer

Musicbrainz

Page 40: From the Semantic Web to the Web of Data: ten years of linking up

Linked Data: four rules, billions of opportunities

Page 41: From the Semantic Web to the Web of Data: ten years of linking up

the server side of the triple: Java and the Semantic Web

Page 42: From the Semantic Web to the Web of Data: ten years of linking up

the server side of the triple: Java and the Semantic Web

RDF is the model

SPARQL is the query language

Linked Data is the paradigm

RDFa is our Trojan horse

how does it fit with Java?

Page 43: From the Semantic Web to the Web of Data: ten years of linking up

the server side of the triple: Java and the Semantic Web

Semantic Web general purposes open sources libraries

Jena[3] - The Semantic Web Java framework

- a RDF API- parsing and writing RDF in RDF/XML, N3 and N-Triples- an OWL API- In-memory storage and persistence layer- SPARQL query engine- Schemagen: Java classes from a RDFSchema vocabulary

Page 44: From the Semantic Web to the Web of Data: ten years of linking up

the server side of the triple: Java and the Semantic Web

Jena: creating a model

// URI declarationsString familyUri = "http://family/";String relationshipUri = "http://purl.org/vocab/relationship/";

// Create an empty ModelModel model = ModelFactory.createDefaultModel();

// Create a Resource for each family member, identified by their URIResource adam = model.createResource(familyUri+"adam");Resource beth = model.createResource(familyUri+"beth");

// Create properties for the different types of relationship to representProperty siblingOf = model.createProperty(relationshipUri,"siblingOf");

// Add properties to adam describing relationships to other family membersadam.addProperty(siblingOf,beth);

Page 45: From the Semantic Web to the Web of Data: ten years of linking up

the server side of the triple: Java and the Semantic Web

Jena: querying the model// Create a new query passing a String containing the RDQL to executeQuery query = new Query(queryString);

// Set the model to run the query againstquery.setSource(model); // Use the query to create a query engineQueryEngine qe = new QueryEngine(query);

// Use the query engine to execute the queryQueryResults results = qe.exec();

while (results.hasNext()) {ResultBinding binding = (ResultBinding)results.next();

RDFNode definition = (RDFNode) binding.get("definition"); System.out.println(definition.toString()); Resource concept = (Resource)binding.get("concept"); List wordforms = concept.listObjectsOfProperty(wordForm);}

Page 46: From the Semantic Web to the Web of Data: ten years of linking up

the server side of the triple: Java and the Semantic Web

other valuable alternatives

Sesame[4] - a generic open source Java framework for storage and querying of RDF data- easy, elegant and well documented

jRDF[5] - an RDF library for Java- notable for IoC support (Spring 2)

Page 47: From the Semantic Web to the Web of Data: ten years of linking up

the server side of the triple: Java and the Semantic Web

getting RDF data

Any23[6] - Anything to Triples- a library- a Web service- a CLI- allows to extract RDF from various sources:

- Microformats: Adr, Geo, hCalendar, hCard, hListing, hResume, hReview, License and XFN- RDF/XML, Turtle and Notation3

- RDF/XML, N3, Turtle and content-negotiated serialization supported

Page 48: From the Semantic Web to the Web of Data: ten years of linking up

the server side of the triple: Java and the Semantic Web

Any23: rdf extraction

/*1*/ Any23 runner = new Any23();/*2*/ runner.setHTTPUserAgent("test-user-agent");/*3*/ HTTPClient httpClient = runner.getHTTPClient();/*4*/ DocumentSource source = new HTTPDocumentSource(         httpClient,         "http://www.rentalinrome.com/semanticloft/semanticloft.htm"      );/*5*/ ByteArrayOutputStream out = new ByteArrayOutputStream();/*6*/ TripleHandler handler = new NTriplesWriter(out);/*7*/ runner.extract(source, handler);/*8*/ String n3 = out.toString("UTF-8");

Page 49: From the Semantic Web to the Web of Data: ten years of linking up

the server side of the triple: Java and the Semantic Web

Any23 deals with such documents that already contains some RDF metadata

extracting the semantics from free-text and disambiguate terms with links to some Linked Data cloud it’s another story

a pletora of different services

- AlchemyAPI[7]- OpenCalais[8]

Page 50: From the Semantic Web to the Web of Data: ten years of linking up

the server side of the triple: Java and the Semantic Web

The world's largest maker of solar inverters announced Monday that it will locate its first North American manufacturing plant in Denver.

"We see a huge market coming in the U.S.," said Pierre-Pascal Urbon, the company's chief financial officer.

The company, based in Kassel, north of Frankfurt, Germany, boasts growing sales of about $1.2 billion a year.

Page 51: From the Semantic Web to the Web of Data: ten years of linking up

the server side of the triple: Java and the Semantic Web

The world's largest maker of solar inverters announced Monday that it will locate its first North American manufacturing plant in Denver.

"We see a huge market coming in the U.S.," said Pierre-Pascal Urbon, the company's chief financial officer.

The company, based in Kassel, north of Frankfurt, Germany, boasts growing sales of about $1.2 billion a year.

http://dbpedia.org/resource/Frankfurthttp://dbpedia.org/resource/Denver

http://dbpedia.org/resource/Kassel

Page 52: From the Semantic Web to the Web of Data: ten years of linking up

the server side of the triple: Java and the Semantic Web

exposed as HTTP Web services they provide responses in XML, RDF/XML, RDFa or JSON

Apache UIMA comes with two annotators for AlchemyAPI and OpenCalais[9]

Page 53: From the Semantic Web to the Web of Data: ten years of linking up

the server side of the triple: Java and the Semantic Web

indexing RDF data

SIREn[10]: Efficient semi-structured Information Retrieval for Lucene- a plugin for Lucene- extends the Lucene query model- semi-structured search- structure aware full-text search- ranked semi-structured search: most relevant results returned first- sub-linear average response time- flexible semi-structured indexing

Page 54: From the Semantic Web to the Web of Data: ten years of linking up

the server side of the triple: Java and the Semantic Web

storing RDF data

commonly known as “triple-stores”[11]

“let me insert triples and make SPARQL queries above them”

- OpenLink Virtuoso- 4Store- Redland- Jena or Sesame over a RDBMS

Page 55: From the Semantic Web to the Web of Data: ten years of linking up

the server side of the triple: Java and the Semantic Web

JDBC and Virtuoso

boolean more = stmt.execute("sparql select * from <gr> where { ?x ?y ?z }"); ResultSetMetaData data = stmt.getResultSet().getMetaData(); while(more) { rs = stmt.getResultSet(); while(rs.next()) {

... } more = stmt.getMoreResults(); }

Page 56: From the Semantic Web to the Web of Data: ten years of linking up

the server side of the triple: Java and the Semantic Web

Empire[12]: JPA for RDF

- Object Triples Mapper- 4Store, Sesame and Jena support- small annotation framework for tying Java beans to RDF-generate Java interfaces for classes described in an OWL ontology automatically based on domain, range constraints, cardinality restrictions- runtime implementation generation- IoC with Google Guice

Page 57: From the Semantic Web to the Web of Data: ten years of linking up

the server side of the triple: Java and the Semantic Web

crawl the Web

extract RDF from RDFa and Microformats with Any23

index the data with SIREn

store the data on HBase

in one word: Sindice.com

Page 58: From the Semantic Web to the Web of Data: ten years of linking up

Linked Data and RDFa seem to be the right ways to trigger the “network effect” about the usage of Semantic Web technologies

successes, failures and hopes

data.gov.uk

Page 59: From the Semantic Web to the Web of Data: ten years of linking up

Twine.com

successes, failures and hopes

it has been the first mainstream consumer application of Semantic Web.

raised nearly $24mm of venture capital over 2 rounds

Twine.com is going to be acquired by Evri.com

gaining users rapidly - faster than Twitter did in it’s early years

Page 60: From the Semantic Web to the Web of Data: ten years of linking up

Twine.com

successes, failures and hopes

“I can truly say they present significant challenges both to developers and to end-users. These challenges all stem from one underlying problem: Data storage.” - Nova Spivack CEO

Page 61: From the Semantic Web to the Web of Data: ten years of linking up

GoodRelations: e-commerce on the Web of Data

successes, failures and hopes

huge impact on traditional search engines ranking

enabling cross-site product and offerings retrieval

Google rich snippets

Page 62: From the Semantic Web to the Web of Data: ten years of linking up

GoodRelations: e-commerce on the Web of Data

successes, failures and hopes

GoodRelations and RDFa could heavily impact on traditional SEO techniques

it may be a really powerful traction for an unleashed usage of RDFa and semi-structured data on the Web

Page 63: From the Semantic Web to the Web of Data: ten years of linking up

/me

twitter.com/dpalmisano

Technologist @ Fondazione Bruno KesslerWeb of Data research Unit

davidepalmisano.wordpress.com

wed.fbk.eu

Page 64: From the Semantic Web to the Web of Data: ten years of linking up

a bunch of references

[1] http://www.cs.umd.edu/projects/plus/SHOE/[2] http://www.jfsowa.com/cg/[3] http://jena.sourceforge.net/[4] http://www.openrdf.org/[5] http://jrdf.sourceforge.net/[6] http://developers.any23.org/[7] http://alchemyapi.com[8] http://opencalais.com[9] http://incubator.apache.org/uima/[10] http://siren.sindice.com/[11] http://en.wikipedia.org/wiki/Triplestore/[12] http://clarkparsia.com/weblog/2010/02/03/empire-0-6/