An Introduction to the Semantic Web Will Strinz The Internet Is Your New Database
Jan 15, 2015
An Introduction to the Semantic WebWill Strinz
The Internet Is Your New Database
What’s this all about then?
Today we have the World Wide Web
What’s this all about then?
Today we have the World Wide Web
What’s this all about then?
Its
Distributed
Accessible using all sorts of devices and software
Document Based
This is very flexible. But
Hard to search
Unstructured and context-less - hard to consume automatically
Easy to share, hard to compose and remix
Full of all sorts of ‘homebrew’ databases and ad-hoc schema
What’s this all about then?
Database software is powerful but rarely interacted with directly
Often accessed through the web, but indirectly
Still ultimately siloed
Hard to compose and remix
What if the internet acted as one big distributed database?
Wat Do?
Move from a web of documents to a web of data
Needs to be
Structured, but still flexible
Distributed
Accessible to machine and human alike
What is this Semantic Web Thing?
Represents information using Subject Predicate Object
Subject Predicate Object
Will Strinz years old 24
Bendyworks a Company
Will Strinz works at Bendyworks
What is this Semantic Web Thing?
With URIs and typed literals
Subject Predicate Object
example.org/Will_Strinz example.org/years_old 24
example.org/Bendyworks example.org/is_a example.org/Company
example.org/Will_Strinz example.org/works_for example.org/Bendyworks
• Called Resource Description Format (RDF)
Example
Lets start with a resource, or “thing”
Example
Lets start with a resource, or “thing”
We’ll call it http://example.org/Fresh, or ex:Fresh for short
Example
What can we say about ex:Fresh?
S P O
ex:Fresh ex:enjoys “B-Ball”
ex:Fresh ex:auntex:Vivian_Bank
s
ex:aunt
ex:enjoys
ex:Vivian_Banks
“B-Ball”ex:Fresh
Example
ex:Vivian_Banks is also a resource
ex:aunt
“B-Ball”ex:Fresh ex:enjoys
ex:Vivian_Banks
S P O
ex:Fresh ex:enjoys “B-Ball”
ex:Fresh ex:auntex:Vivian_Bank
s
Example
So now we can say things about her too!
S P O
ex:Fresh ex:enjoys “B-Ball”
ex:Fresh ex:aunt ex:Vivian_Banks
ex:Vivian_Banks
ex:nickname
“Aunt Viv”
1
ex:nickname “Aunt Viv”
ex:aunt
“B-Ball”ex:Fresh ex:enjoys
ex:Vivian_Banks
Vocabularies
We’ve been defining our own predicates and objects so far
Could add details about each predicate
Isn’t this a waste of time?
Yes! Use RDF Vocabularies
Define new namespaces, terms, and objects
‘Imported’ simply by reference
Are described in RDF
FOAF Vocabulary
“Friend Of A Friend”
Located at http://xmlns.com/foaf/0.1/
foaf:name a rdf:Property, owl:DatatypeProperty; rdfs:label "name"; rdfs:comment "A name for some thing."; rdfs:domain owl:Thing; rdfs:isDefinedBy foaf:; rdfs:range rdfs:Literal; rdfs:subPropertyOf rdfs:label; sw_ns:term_status "testing" .
Example
S P O
ex:Freshfoaf:based_ne
arex:West_Philedelp
hia
ex:Fresh foaf:age 20
ex:Fresh foaf:nameWill ‘The Fresh Prince’ Smith
ex:aunt
ex:enjoys “B-Ball”
ex:Vivian_Banks
ex:Fresh
ex:nickname“Aunt Viv”
foaf:based_near
ex:West_Philadelphia
foaf:age
22 foaf:name
“Will ‘The Fresh Prince’ Smith”
Example
S P O
ex:Freshfoaf:based_ne
arex:West_Philedelp
hia
ex:Fresh foaf:age 20
ex:Fresh foaf:nameWill ‘The Fresh Prince’ Smith
ex:aunt
ex:enjoys “B-Ball”
ex:Vivian_Banks
ex:Fresh
ex:nickname“Aunt Viv”
foaf:based_near
ex:West_Philadelphia
foaf:age
22 foaf:name
“Will ‘The Fresh Prince’ Smith”
Lets look at what we have
Human readable
Simple and flexible
Rigid when necessary
Atomic statements
Dual representation
Ideally
Dereferencable
Structured, Machine understandable
Serialization
Multiple formats
All of which are
Standardized
Interoperable
Information preserving
File or Triple Store
Serialization
NTriples<http://example.orgfresh> <http://xmlns.com/foaf/0.1/name> "Will 'The Fresh Prince' Smith" .<http://example.orgfresh> <http://example.orgenjoys> "B-Ball" .<http://example.orgfresh> <http://xmlns.com/foaf/0.1/age> "22"^^<http://www.w3.org/2001/XMLSchema#integer> .<http://example.orgfresh> <http://example.orgaunt> <http://example.orgVivian_Banks> .<http://example.orgfresh> <http://xmlns.com/foaf/0.1/based_near> <http://example.orgWest_Philadelphia> .<http://example.orgVivian_Banks> <http://example.orgnickname> "Aunt Viv" .
Turtle@prefix ex: <http://example.org> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .
ex:fresh foaf:name "Will 'The Fresh Prince' Smith" ; ex:enjoys "B-Ball" ; foaf:age 22 ; ex:aunt ex:Vivian_Banks ; foaf:based_near ex:West_Philadelphia .
ex:Vivian_Banks ex:nickname "Aunt Viv" .
Serialization
JSON-LD
{ "@context": { "ex": "http://example.org", "foaf": "http://xmlns.com/foaf/0.1/" }, "@graph": [ { "@id": "ex:Vivian_Banks", "ex:nickname": "Aunt Viv" }, { "@id": "ex:fresh", "ex:aunt": { "@id": "ex:Vivian_Banks" }, "ex:enjoys": "B-Ball", "foaf:age": { "@value": "22", "@type": "http://www.w3.org/2001/XMLSchema#integer" }, "foaf:based_near": { "@id": "ex:West_Philadelphia" }, "foaf:name": "Will 'The Fresh Prince' Smith" } ]}
Serialization
RDF/XML
<?xml version='1.0' encoding='utf-8' ?><rdf:RDF xmlns:ex='http://example.org' xmlns:foaf='http://xmlns.com/foaf/0.1/' xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' xmlns:xsd='http://www.w3.org/2001/XMLSchema#'> <rdf:Description rdf:about='http://example.orgVivian_Banks'> <ex:nickname>Aunt Viv</ex:nickname> </rdf:Description> <rdf:Description rdf:about='http://example.orgfresh'> <ex:aunt rdf:resource='http://example.orgVivian_Banks' /> <ex:enjoys>B-Ball</ex:enjoys> <foaf:age rdf:datatype='http://www.w3.org/2001/XMLSchema#integer'>22</foaf:age> <foaf:based_near rdf:resource='http://example.orgWest_Philadelphia' /> <foaf:name>Will 'The Fresh Prince' Smith</foaf:name> </rdf:Description></rdf:RDF>
Serialization
Other Formats
TRiG/TRiX
RDFa
N3
NQuads
Mappings onto SQLite, Mongo, etc
Querying
What new language do I have to learn just to query
SPARQL!
But stick with me, its not so bad
Syntax similar to SQL, but joins are free!
Much more consistent across endpoints and triple stores
Querying
How old is Will ‘The Fresh Prince’ Smith?
PREFIX foaf: <http://xmlns.com/foaf/0.1/>PREFIX ex: <http://example.org>
SELECT (?age) WHERE { ex:Fresh foaf:age ?age}
=> 22
Querying
What is Will ‘The Fresh Prince’ Smith’s Aunt’s nickname?
SELECT (?nick) WHERE { ex:Fresh ex:aunt ?aunt . ?aunt ex:nickname ?nick .}
=> “Aunt Viv”
Querying
What do we know about Will ‘The Fresh Prince’ Smith?
SELECT (?prop) WHERE { ex:Fresh ?prop ?value .}
=> foaf:name, ex:enjoys, foaf:age, ex:aunt, foaf:based_near
Querying
• Update, Insert, and Delete
• Logical / Regex filters
• Subqueries
• Order and Offset
• Aggregates
Many other features in SPARQL
• Data Types
• Construct, Describe, and Ask modes
• Order and Offset
• Property Paths
• Math functions
Getting Connected
So, RDF has some cool features
• Flexible yet structured
• Atomic
• Graph Based
• W3C Backed
• Human Friendly
• Queryable
• Serializable
• Extensible
And I can say things about Will Smith with itBut how different is that really from any other database?
Connections!
Getting Connected
Remember our ex:Fresh URI?
http://example.org/Fresh
Its not dereferencable.
What if instead we used
http://dbpedia.org/resource/Will_Smith_(character)
Getting Connected
Suddenly we have more information
dbpedia:Will_Smith_(character) a yago:FictionalCharacter, dbpedia-owl:FictionalCharacter, yago:FictionalVersionsOfRealPeople, yago:ImaginaryBeing109483738, dbpedia-owl:Person, dbpedia-owl:Agent, yago:SitcomCharacters, owl:Thing, foaf:Person; rdfs:label "Will Smith (character)”@en;
Getting Connected
A lot more information
dbpedia-owl:abstract "William \"Will\" Smith (born July 3, 1973) is a fictional character in the NBC television series, The Fresh Prince of Bel-Air."@en; dbpedia-owl:birthDate "1973-07-02+02:00"^^xsd:date; dbpedia-owl:portrayer dbp:Will_Smith; dbpedia-owl:series dbp:The_Fresh_Prince_of_Bel-Air; dbpedia-owl:wikiPageExternalLink <http://www.imdb.com/character/ch0020905/>; dbpprop:born "1973-07-02+02:00"^^xsd:date; dbpprop:family "Janice Smith"@en, "Hilary Banks"@en, "Carlton Banks"@en, "Vy Smith-Wilkes"@en, "Lisa Wilkes"@en, "Lou Smith"@en, "Ashley Banks"@en, "Helen Smith"@en, "Fred Wilkes"@en, "Phillip Banks"@en, "Vivian Banks"@en;
Getting Connected
dbpprop:first "\"The Fresh Prince Project\""@en; dbpprop:hasPhotoCollection informatik:Will_Smith_(character); dbpprop:name "Will Smith"@en; dbpprop:nicknames "Master William, Prince, Fresh Prince, Will"@en; dbpprop:portrayer dbp:Will_Smith; dbpprop:series dbp:The_Fresh_Prince_of_Bel-Air; dbpprop:wordnet_type wn:synset-character-noun-4; dcterms:subject category:Fictional_characters_introduced_in_1990, category:Fictional_African-American_people, category:Fictional_versions_of_real_people, dbpcategory:Fictional_characters_from_Philadelphia,_Pennsylvania, category:Sitcom_characters; rdfs:comment "William \"Will\" Smith (born July 3, 1973) is a fictional character in the NBC television series, The Fresh Prince of Bel-Air."@en; owl:sameAs <http://rdf.freebase.com/ns/m.0417_vv>, <http://yago-knowledge.org/resource/Will_Smith_(character)>, <http://dbpedia.org/resource/Will_Smith_(character)>; foaf:isPrimaryTopicOf <http://en.wikipedia.org/wiki/Will_Smith_(character)>; foaf:name "Will Smith"@en .
Getting Connected
Human AND Machine Readable
Getting Connected
DBPedia has a SPARQL Endpoint
Lots of fun queries
SELECT * WHERE { ?episode dc:subject dbpcategory:The_Simpsons_%28season_14%29_episodes . ?episode dbpedia2:blackboard ?chalkboard_gag .}
Getting Connected
Larger goal of connecting the whole web
I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A "Semantic Web", which makes this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The "intelligent agents" people have touted for ages will finally materialize. — Sir Tim Berners-Lee (1999)
http://dgallery.s3.amazonaws.com/lod-cloud_colored.png
Going Further
Templating with RDFa or XSLT
Ontologies / OWL
Reasoning
Libraries
Ruby-RDF
Spira
Publisci and Publisci Server
Tools and interfaces - Build them!
Disclaimers
Relatively young
Less engineering time
SPARQL changing and not fully implemented in all triple stores
Flexibility has its downsides; Garbage in Garbage out
No agreed upon method for schema constraints
End
Thanks Bendyconf Attendees and Organizers!
Questions?