From the Semantic Web to the Web of Data: ten years of linking up
Post on 08-Sep-2014
31277 Views
Preview:
DESCRIPTION
Transcript
from the Semantic Web to the Web of Dataten years of linking up
Davide Palmisano - Fondazione Bruno KesslerLugano 30-03-2010
a short ToC
story of a buzzword
concepts and ideas behind it
Linked Data: four rules, billions of opportunities
successes, failures and hopes
the server side of the triple: Java and the Semantic Web
story of a buzzword
“To a computer, the Web is a flat, boring world devoid
of meaning.”
“A new form of Web content that is meaningful to computers will
unleash a revolution of new possibilities”
“The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, ”
story of a buzzword
story of a buzzword
story of a buzzword
story of a buzzword
“Adding semantics to the web involves two things: allowing documents which have information in machine-readable forms, and allowing links to be created with relationship values.”
story of a buzzword
The Web as a global giant decentralized database
machine-readable content metadata
typed objects and relationships
with shared semantics
concepts and ideas behind it
How to represent the knowledge ?
concepts and ideas behind it
concepts and ideas behind it
How to represent the knowledge ?
World’s academic communities dealt for years with knowledge representation
artificial intelligence, natural language processing, model management and many other research fields largely contributed
some ancestors traced the way
concepts and ideas behind it
SHOE[1]
“SHOE is an extension to HTML which allows authors to annotate their web pages with machine-readable knowledge”
<USE-ONTOLOGY ID="cs-dept-ontology" VERSION="1.0" PREFIX="cs" URL= "http://www.cs.umd.edu/projects/plus/SHOE/cs.html">
<CATEGORY NAME="cs.Professor" FOR="http://www.cs.umd.edu/users/hendler/">
<RELATION NAME="cs.member"> <ARG POS=1 VALUE="http://www.cs.umd.edu/projects/plus/"> <ARG POS=2 VALUE="http://www.cs.umd.edu/users/hendler/"> </RELATION>
<RELATION NAME="cs.name"> <ARG POS=2 VALUE="Dr. James Hendler"> </RELATION>
concepts and ideas behind it
John Sowa’s Conceptual Graphs [2]
(...) they express meaning in a form that is logically precise, humanly readable, and computationally tractable (...)
AGNTBOY WALK
“boy walking”
has been the goal of a standardization effort mainly lead by the W3C
concepts and ideas behind it
declining such approaches in a
unpredictable
decentralized
potentially incoherent
environment as the Web is
concepts and ideas behind it
Resource Description Framework RDF
corner stone of the Semantic Web technology stack
1999, first publication
directed and labeled graphs as data model
concepts and ideas behind it
everything is univocally identifiable with a Uniform Resource Identifier
a web page, a person, a book, an intangible thing
http://dpalmisano.myopenid.com
http://dbpedia.org/resource/Lugano
http://dbtune.org/myspace/coldplay
concepts and ideas behind it
relationships between things could be expressed with a directed, labeled graph
where
nodes could be resources or XMLSchema-typed values
and relationships are identified also by URIs
concepts and ideas behind it
http://dpalmisano.myopenid.com
http://sws.geonames.org/3165243/
concepts and ideas behind it
http://dpalmisano.myopenid.com
http://sws.geonames.org/3165243/
http://xmlns.com/foaf/0.1/based_near
it’s an RDF triple
concepts and ideas behind it
http://dpalmisano.myopenid.com
http://sws.geonames.org/3165243/
http://xmlns.com/foaf/0.1/based_near
http://www.geonames.org/ontology#name
Trento
concepts and ideas behind it
http://dpalmisano.myopenid.com
http://sws.geonames.org/3165243/
http://xmlns.com/foaf/0.1/based_near
http://www.geonames.org/ontology#name
Trento
http://www.geonames.org/ontology#population
104946
concepts and ideas behind it
http://dpalmisano.myopenid.com http://sws.geonames.org/3165243/
http://xmlns.com/foaf/0.1/based_near
XML serialization
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf="http://xmlns.com/foaf/0.1/">
<rdf:Description rdf:about="http://dpalmisano.myopenid.com/"><foaf:based_near rdf:resource="http://sws.geonames.org/3165243/"/>
</rdf:Description>
</rdf:RDF>
concepts and ideas behind it
http://dpalmisano.myopenid.com http://sws.geonames.org/3165243/
http://xmlns.com/foaf/0.1/based_near
Turtle serialization
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .
<http://dpalmisano.myopenid.com/> foaf:based_near <http://sws.geonames.org/3165243/> .
concepts and ideas behind it
http://dpalmisano.myopenid.com http://sws.geonames.org/3165243/
http://xmlns.com/foaf/0.1/based_near
N3 serialization
<http://dpalmisano.myopenid.com/> <http://xmlns.com/foaf/0.1/based_near> <http://sws.geonames.org/3165243/> .
concepts and ideas behind it
http://dpalmisano.myopenid.com http://sws.geonames.org/3165243/
http://xmlns.com/foaf/0.1/based_near
JSON serialization
{ "http://dpalmisano.myopenid.com" : { "http://xmlns.com/foaf/0.1/based_near": [ { "type" : "uri" , "value" : "http://sws.geonames.org/3165243/" } ] }}
concepts and ideas behind it
this triple represents a relationshipbetween two resources
http://dpalmisano.myopenid.com http://sws.geonames.org/3165243/
http://xmlns.com/foaf/0.1/based_near
but how we can represent the meaning of that relationship?
defining vocabularies and ontologies: RDFSchema and OWL
concepts and ideas behind it
http://helloworld.com/ontology/Person
http://helloworld.com/ontology/father
an “Hello World” RDFSchema vocabulary
rdfs:Class rdfs:Property
rdf:type
rdf:type
rdf:typerdf:type
concepts and ideas behind it
http://helloworld.com/ontology/Person
http://helloworld.com/resource/Michele
RDFSchema entailment: inferring new statements
http://helloworld.com/resource/Davide
http://helloworld.com/ontology/father
rdf:type
concepts and ideas behind it
http://helloworld.com/ontology/Person
http://helloworld.com/resource/Michele
RDFSchema entailment: inferring new statements
http://helloworld.com/resource/Davide
http://helloworld.com/ontology/father
rdf:type
rdf:type
concepts and ideas behind it
OWL allows to specify other axioms
property cardinality restrictions
classes disjunction
property transitivity
cardinality constraints
but beware: more expressivity means more reasoning complexity
interested in these topics? give a try to [3]
concepts and ideas behind it
describe everything...
and more...
concepts and ideas behind it
RDFa: Bridging the traditional Web with the Semantic Web
<div rel="dc:creator"> <span typeof="foaf:Person" about="http://foafbuilder.qdos.com/people/dpalmisano.myopenid.com/foaf.rdf#me"> <a property="foaf:name" rel="foaf:homepage" href="http://dpalmisano.myopenid.com/">Davide Palmisano</a> <a rel="foaf:workplaceHomepage"
href="http://www.fbk.eu">Fondazione Bruno Kessler</a> </span></div>
concepts and ideas behind it
SPARQL: querying the Semantic Web
based on graph pattern matching
SPARQL Protocol and RDF Query Language
4 different operators: SELECT, DESCRIBE, ASK and CONSTRUCT
concepts and ideas behind it
SPARQL: querying the Semantic Web
SELECT ?personWHERE {
?person a foaf:Person.?person ex:age ?age. FILTER(?age > 18)}
concepts and ideas behind it
SPARQL: querying the Semantic Web
“In which university have studied the founders of successful IT companies?”
and order them by frequency...
concepts and ideas behind it
SELECT DISTINCT ?almaMater, count(?almaMater) as ?frequency WHERE {{ {?company a dbpedia-owl:Company} UNION { ?company a yago:InternetCompaniesOfTheUnitedStates } UNION {?company a yago:CompaniesBasedInSiliconValley} UNION {?company a yago:CompaniesListedOnNASDAQ} }?company dbpedia-owl:numberOfEmployees ?numberOfEmpl.FILTER (?numberOfEmpl > 0).OPTIONAL { ?company dbpedia-owl:keyPerson ?keyPerson }?keyPerson dbpprop:almaMater ?almaMater.}ORDER BY DESC(?frequency)
Linked Data: four rules, billions of opportunities
1. Use URIs to identify things.
2. Use HTTP URIs so that these things can be referred to and looked up ("dereference") by people and user agents.
3. Provide useful information (i.e., a structured description - metadata) about the thing when its URI is dereferenced.
4. Include links to other, related URIs in the exposed data to improve discovery of other related information on the Web.
Linked Data: four rules, billions of opportunities
DBpedia: Wikipedia as a database
extract such structured info and represent it with RDF
Linked Data: four rules, billions of opportunities
let’s do it also for
and for all imaginable data-intensive traditional Web sites...
Internet Movie Database
GeoNames
BBC /programmes
CIA factbook
CiteSeer
Musicbrainz
Linked Data: four rules, billions of opportunities
the server side of the triple: Java and the Semantic Web
the server side of the triple: Java and the Semantic Web
RDF is the model
SPARQL is the query language
Linked Data is the paradigm
RDFa is our Trojan horse
how does it fit with Java?
the server side of the triple: Java and the Semantic Web
Semantic Web general purposes open sources libraries
Jena[3] - The Semantic Web Java framework
- a RDF API- parsing and writing RDF in RDF/XML, N3 and N-Triples- an OWL API- In-memory storage and persistence layer- SPARQL query engine- Schemagen: Java classes from a RDFSchema vocabulary
the server side of the triple: Java and the Semantic Web
Jena: creating a model
// URI declarationsString familyUri = "http://family/";String relationshipUri = "http://purl.org/vocab/relationship/";
// Create an empty ModelModel model = ModelFactory.createDefaultModel();
// Create a Resource for each family member, identified by their URIResource adam = model.createResource(familyUri+"adam");Resource beth = model.createResource(familyUri+"beth");
// Create properties for the different types of relationship to representProperty siblingOf = model.createProperty(relationshipUri,"siblingOf");
// Add properties to adam describing relationships to other family membersadam.addProperty(siblingOf,beth);
the server side of the triple: Java and the Semantic Web
Jena: querying the model// Create a new query passing a String containing the RDQL to executeQuery query = new Query(queryString);
// Set the model to run the query againstquery.setSource(model); // Use the query to create a query engineQueryEngine qe = new QueryEngine(query);
// Use the query engine to execute the queryQueryResults results = qe.exec();
while (results.hasNext()) {ResultBinding binding = (ResultBinding)results.next();
RDFNode definition = (RDFNode) binding.get("definition"); System.out.println(definition.toString()); Resource concept = (Resource)binding.get("concept"); List wordforms = concept.listObjectsOfProperty(wordForm);}
the server side of the triple: Java and the Semantic Web
other valuable alternatives
Sesame[4] - a generic open source Java framework for storage and querying of RDF data- easy, elegant and well documented
jRDF[5] - an RDF library for Java- notable for IoC support (Spring 2)
the server side of the triple: Java and the Semantic Web
getting RDF data
Any23[6] - Anything to Triples- a library- a Web service- a CLI- allows to extract RDF from various sources:
- Microformats: Adr, Geo, hCalendar, hCard, hListing, hResume, hReview, License and XFN- RDF/XML, Turtle and Notation3
- RDF/XML, N3, Turtle and content-negotiated serialization supported
the server side of the triple: Java and the Semantic Web
Any23: rdf extraction
/*1*/ Any23 runner = new Any23();/*2*/ runner.setHTTPUserAgent("test-user-agent");/*3*/ HTTPClient httpClient = runner.getHTTPClient();/*4*/ DocumentSource source = new HTTPDocumentSource( httpClient, "http://www.rentalinrome.com/semanticloft/semanticloft.htm" );/*5*/ ByteArrayOutputStream out = new ByteArrayOutputStream();/*6*/ TripleHandler handler = new NTriplesWriter(out);/*7*/ runner.extract(source, handler);/*8*/ String n3 = out.toString("UTF-8");
the server side of the triple: Java and the Semantic Web
Any23 deals with such documents that already contains some RDF metadata
extracting the semantics from free-text and disambiguate terms with links to some Linked Data cloud it’s another story
a pletora of different services
- AlchemyAPI[7]- OpenCalais[8]
the server side of the triple: Java and the Semantic Web
The world's largest maker of solar inverters announced Monday that it will locate its first North American manufacturing plant in Denver.
"We see a huge market coming in the U.S.," said Pierre-Pascal Urbon, the company's chief financial officer.
The company, based in Kassel, north of Frankfurt, Germany, boasts growing sales of about $1.2 billion a year.
the server side of the triple: Java and the Semantic Web
The world's largest maker of solar inverters announced Monday that it will locate its first North American manufacturing plant in Denver.
"We see a huge market coming in the U.S.," said Pierre-Pascal Urbon, the company's chief financial officer.
The company, based in Kassel, north of Frankfurt, Germany, boasts growing sales of about $1.2 billion a year.
http://dbpedia.org/resource/Frankfurthttp://dbpedia.org/resource/Denver
http://dbpedia.org/resource/Kassel
the server side of the triple: Java and the Semantic Web
exposed as HTTP Web services they provide responses in XML, RDF/XML, RDFa or JSON
Apache UIMA comes with two annotators for AlchemyAPI and OpenCalais[9]
the server side of the triple: Java and the Semantic Web
indexing RDF data
SIREn[10]: Efficient semi-structured Information Retrieval for Lucene- a plugin for Lucene- extends the Lucene query model- semi-structured search- structure aware full-text search- ranked semi-structured search: most relevant results returned first- sub-linear average response time- flexible semi-structured indexing
the server side of the triple: Java and the Semantic Web
storing RDF data
commonly known as “triple-stores”[11]
“let me insert triples and make SPARQL queries above them”
- OpenLink Virtuoso- 4Store- Redland- Jena or Sesame over a RDBMS
the server side of the triple: Java and the Semantic Web
JDBC and Virtuoso
boolean more = stmt.execute("sparql select * from <gr> where { ?x ?y ?z }"); ResultSetMetaData data = stmt.getResultSet().getMetaData(); while(more) { rs = stmt.getResultSet(); while(rs.next()) {
... } more = stmt.getMoreResults(); }
the server side of the triple: Java and the Semantic Web
Empire[12]: JPA for RDF
- Object Triples Mapper- 4Store, Sesame and Jena support- small annotation framework for tying Java beans to RDF-generate Java interfaces for classes described in an OWL ontology automatically based on domain, range constraints, cardinality restrictions- runtime implementation generation- IoC with Google Guice
the server side of the triple: Java and the Semantic Web
crawl the Web
extract RDF from RDFa and Microformats with Any23
index the data with SIREn
store the data on HBase
in one word: Sindice.com
Linked Data and RDFa seem to be the right ways to trigger the “network effect” about the usage of Semantic Web technologies
successes, failures and hopes
data.gov.uk
Twine.com
successes, failures and hopes
it has been the first mainstream consumer application of Semantic Web.
raised nearly $24mm of venture capital over 2 rounds
Twine.com is going to be acquired by Evri.com
gaining users rapidly - faster than Twitter did in it’s early years
Twine.com
successes, failures and hopes
“I can truly say they present significant challenges both to developers and to end-users. These challenges all stem from one underlying problem: Data storage.” - Nova Spivack CEO
GoodRelations: e-commerce on the Web of Data
successes, failures and hopes
huge impact on traditional search engines ranking
enabling cross-site product and offerings retrieval
Google rich snippets
GoodRelations: e-commerce on the Web of Data
successes, failures and hopes
GoodRelations and RDFa could heavily impact on traditional SEO techniques
it may be a really powerful traction for an unleashed usage of RDFa and semi-structured data on the Web
/me
twitter.com/dpalmisano
Technologist @ Fondazione Bruno KesslerWeb of Data research Unit
davidepalmisano.wordpress.com
wed.fbk.eu
a bunch of references
[1] http://www.cs.umd.edu/projects/plus/SHOE/[2] http://www.jfsowa.com/cg/[3] http://jena.sourceforge.net/[4] http://www.openrdf.org/[5] http://jrdf.sourceforge.net/[6] http://developers.any23.org/[7] http://alchemyapi.com[8] http://opencalais.com[9] http://incubator.apache.org/uima/[10] http://siren.sindice.com/[11] http://en.wikipedia.org/wiki/Triplestore/[12] http://clarkparsia.com/weblog/2010/02/03/empire-0-6/
top related