NDBI040: Big Data Management and NoSQL Databases hƩp://www.ksi.mff.cuni.cz/~svoboda/courses/171-NDBI040/ Lecture 4 RDF Stores: SPARQL MarƟn Svoboda [email protected]ff.cuni.cz 24. 10. 2017 Charles University in Prague, Faculty of MathemaƟcs and Physics Czech Technical University in Prague, Faculty of Electrical Engineering
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
NDBI040: Big Data Management and NoSQL Databasesh p://www.ksi.mff.cuni.cz/~svoboda/courses/171-NDBI040/
SPARQL query language• Graph pa erns• Filter constraints• Solu on modifiers• Aggrega on• Query forms
NDBI040: Big Data Management and NoSQL Databases | Lecture 4: RDF Stores: SPARQL | 24. 10. 2017 2
RDF StoresData model
• RDF triplesComponents: subject, predicate, and objectEach triple represents a statement about a real-world en ty
• Triples can be viewed as graphsVer ces for subjects and objectsEdges directly correspond to individual statements
Query language• SPARQL: SPARQL Protocol and RDF Query Language
Representa ves• Apache Jena, rdf4j (Sesame), Algebraix• Mul -model: MarkLogic, OpenLink Virtuoso
NDBI040: Big Data Management and NoSQL Databases | Lecture 4: RDF Stores: SPARQL | 24. 10. 2017 3
Linked DataLinked Data
• Method of publishing structured and interlinked data in away that allows for an automated processing by programsrather than browsing by human readers
Principles of Linked Open Data• Iden fy resources using URIs or even be er using URLs• Publish data about resources in standard formats via HTTP• Mutually interlink resources to form Web of Data• Release the data under an open licence
NDBI040: Big Data Management and NoSQL Databases | Lecture 4: RDF Stores: SPARQL | 24. 10. 2017 4
Linked Open Data CloudMay 2007
Source: h p://lod-cloud.net/
NDBI040: Big Data Management and NoSQL Databases | Lecture 4: RDF Stores: SPARQL | 24. 10. 2017 5
Linked Open Data CloudSeptember 2011
Source: h p://lod-cloud.net/
NDBI040: Big Data Management and NoSQL Databases | Lecture 4: RDF Stores: SPARQL | 24. 10. 2017 6
Linked Open Data CloudAugust 2017
Source: h p://lod-cloud.net/
NDBI040: Big Data Management and NoSQL Databases | Lecture 4: RDF Stores: SPARQL | 24. 10. 2017 7
Linked DataSta s cs
• October 200725 datasets2 billion triples, 2 million links
• September 2011295 datasets31 billion triples, 504 million links
• August 20171163 datasets
NDBI040: Big Data Management and NoSQL Databases | Lecture 4: RDF Stores: SPARQL | 24. 10. 2017 8
SPARQL Query Language
SPARQLSPARQL Query Language
• Query language for RDF dataGraph pa erns, op onal graph pa erns, subqueries, nega on,aggrega on, value constructors, …
• Versions: 1.0 (2008), 1.1 (2013)• W3C recommenda ons
h ps://www.w3.org/TR/sparql11-query/Altogether 11 recommenda ons: query language, updatefacility, federated queries, protocol, result formats, …
NDBI040: Big Data Management and NoSQL Databases | Lecture 4: RDF Stores: SPARQL | 24. 10. 2017 10
NDBI040: Big Data Management and NoSQL Databases | Lecture 4: RDF Stores: SPARQL | 24. 10. 2017 17
Select QueriesSELECT queries
prologue declarationsprologue declarations
SELECT clauseSELECT clause
FROM clauseFROM clause
WHERE clauseWHERE clause
solution modifierssolution modifiers
• Prologue declara ons – PREFIX, BASE• Main clauses
SELECT – variables to be projectedFROM – data graphs to be queriedWHERE – graph pa erns to be matched
• Solu on modifiers – ORDER BY, …
NDBI040: Big Data Management and NoSQL Databases | Lecture 4: RDF Stores: SPARQL | 24. 10. 2017 18
Prologue Declara onsPrologue declara ons
• Allow to simplify IRI references by declaring base IRIs
BASEBASE IRI referenceIRI reference
PREFIXPREFIX prefix nameprefix name :: IRI referenceIRI reference
BASE clause• One single base IRI is defined
all rela ve IRI references are then related to this base IRIPREFIX clause
• Several base IRIs are defined, each is associated with a nameall prefixed names are then related to the respec ve base IRI
NDBI040: Big Data Management and NoSQL Databases | Lecture 4: RDF Stores: SPARQL | 24. 10. 2017 19
Prologue Declara onsExamples
• When BASE <http://db.cz/> is defined,then a rela ve IRI reference terms#Movieis interpreted as http://db.cz/terms#Movie
• When PREFIX i: <http://db.cz/> is defined,then a prefixed name i:terms#Movieis interpreted as http://db.cz/terms#Movie
NDBI040: Big Data Management and NoSQL Databases | Lecture 4: RDF Stores: SPARQL | 24. 10. 2017 20
Where ClauseWHERE clause
• Prescribes one group graph pa ern
WHEREWHERE group graph patterngroup graph pattern
Types of graph pa erns• Basic – triple pa erns to be matched• Group – set of graph pa erns to be matched• Op onal – graph pa ern to be matched only if possible• Alterna ve – two or more alterna ve graph pa erns• …
Graph pa erns can be induc vely combined into complex ones
NDBI040: Big Data Management and NoSQL Databases | Lecture 4: RDF Stores: SPARQL | 24. 10. 2017 21
Graph Pa ernsBasic Graph Pa ern
Basic graph pa ern (triple block)
One or more triple pa erns to be all matched
• Ordinary triples separated by .• … or their abbreviated forms inspired by Turtle nota on
Object lists using ,Predicate-object lists using ;Blank nodes using []
Examples• s p1 o1 . s p1 o2 . s p2 o3 .• s p1 o1 , o2 ; p2 o3 .
NDBI040: Big Data Management and NoSQL Databases | Lecture 4: RDF Stores: SPARQL | 24. 10. 2017 22
Graph Pa ernsBasic Graph Pa ern
Interpreta on• All the involved triple pa erns must be matched
I.e. we combine them as if they were in conjunc onMore precisely…
– Each triple pa ern is evaluated to its solu on sequence– All combina ons of compa ble solu ons are then found
• Note that all the variables need to be boundI.e. if any of the involved variables cannot be bound at all,then the en re basic graph pa ern cannot be matched!
NDBI040: Big Data Management and NoSQL Databases | Lecture 4: RDF Stores: SPARQL | 24. 10. 2017 23
Graph Pa erns: ExampleBasic Graph Pa ern
Titles and years of all moviesPREFIX i: <http://db.cz/terms#>SELECT ?t ?yFROM <http://db.cz/movies>WHERE
NDBI040: Big Data Management and NoSQL Databases | Lecture 4: RDF Stores: SPARQL | 24. 10. 2017 34
From ClauseFROM clause
• Defines data graphs to be queried
FROMFROM
NAMEDNAMED
IRI referenceIRI reference
prefixed nameprefixed name
Dataset = collec on of graphs to be queried• One default graph
Merge of all the declared graphs from unnamed FROM clausesEmpty when no unnamed FROM clause is provided
• Zero or more named graphsAc ve graph = used for the evalua on of graph pa erns
• The default graph unless changed using GRAPH graph pa ern
NDBI040: Big Data Management and NoSQL Databases | Lecture 4: RDF Stores: SPARQL | 24. 10. 2017 35
From Clause: ExampleNames of actors who played inMedvídekmoviePREFIX i: <http://db.cz/terms#>PREFIX m: <http://db.cz/movies/>SELECT ?f ?lFROM <http://db.cz/movies>FROM <http://db.cz/actors>WHERE
NDBI040: Big Data Management and NoSQL Databases | Lecture 4: RDF Stores: SPARQL | 24. 10. 2017 36
Graph Pa ernsGraph Graph Pa ern
GRAPH graph pa ern
Pa ern evaluated with respect to a par cular named graph
GRAPHGRAPH variablevariable
IRI referenceIRI reference
prefixed nameprefixed name
group graph patterngroup graph pattern
• Changes the ac ve graph for a given group graph pa ernGRAPH <http://db.cz/actors> { … }
• We can also consider all the named graphsGRAPH ?g { … }
NDBI040: Big Data Management and NoSQL Databases | Lecture 4: RDF Stores: SPARQL | 24. 10. 2017 37
Graph Pa erns: ExampleGraph Graph Pa ern
Names of actors who played inMedvídekmoviePREFIX i: <http://db.cz/terms#>PREFIX m: <http://db.cz/movies/>SELECT ?f ?lFROM <http://db.cz/movies>FROM NAMED <http://db.cz/actors>WHERE
NDBI040: Big Data Management and NoSQL Databases | Lecture 4: RDF Stores: SPARQL | 24. 10. 2017 38
Variable AssignmentsBIND graph pa ern
Explicitly assigns a value to a given variable
BINDBIND (( expressionexpression ASAS variablevariable ))
• This variable must not yet be bound!
NDBI040: Big Data Management and NoSQL Databases | Lecture 4: RDF Stores: SPARQL | 24. 10. 2017 39
Filter ConstraintsFILTER constraints
Impose constraints on variables and their values
FILTERFILTER built-in callbuilt-in call
(( expressionexpression ))
• Only solu ons sa sfying the given condi on are preserved• Does not create any new variable bindings!• Always applied on the en re group graph pattern
i.e. evaluated at the very end
NDBI040: Big Data Management and NoSQL Databases | Lecture 4: RDF Stores: SPARQL | 24. 10. 2017 40
Filter Constraints: ExampleMovies filmed in 2005 or later where Ivan Trojan playedPREFIX i: <http://db.cz/terms#>PREFIX a: <http://db.cz/actors/>SELECT ?t ?yFROM <http://db.cz/movies>WHERE
NDBI040: Big Data Management and NoSQL Databases | Lecture 4: RDF Stores: SPARQL | 24. 10. 2017 48
Solu on ModifiersLIMIT clause
• Limits the number of solu ons in the query result
LIMITLIMIT integerinteger
OFFSET clause• Skips a certain number of solu ons in the query result
OFFSETOFFSET integerinteger
NDBI040: Big Data Management and NoSQL Databases | Lecture 4: RDF Stores: SPARQL | 24. 10. 2017 49
Solu on Modifiers: ExamplePREFIX i: <http://db.cz/terms#>SELECT ?t ?yFROM <http://db.cz/movies>WHERE
{?m rdf:type i:Movie ;
i:title ?t ;i:year ?y .
}ORDER BY DESC(?y) ASC(?t)OFFSET 1LIMIT 5
?t ?yVratné lahve 2006Samotáři 2000
NDBI040: Big Data Management and NoSQL Databases | Lecture 4: RDF Stores: SPARQL | 24. 10. 2017 50
Aggrega onGROUP BY + HAVING clauses
• Standard aggrega on over a solu on sequence
GROUP BYGROUP BY variablevariable
built-in callbuilt-in call
(( expressionexpression
ASAS variablevariable
))
HAVINGHAVING built-in callbuilt-in call
(( expressionexpression ))
NDBI040: Big Data Management and NoSQL Databases | Lecture 4: RDF Stores: SPARQL | 24. 10. 2017 51
Aggrega on: ExampleNumbers of actors in movies with at most 2 actorsPREFIX i: <http://db.cz/terms#>SELECT ?t (COUNT(?a) AS ?c)FROM <http://db.cz/movies>WHERE
{?m rdf:type i:Movie ;
i:title ?t ;i:actor ?a .
}GROUP BY ?m ?tHAVING (?c <= 2)ORDER BY ?c ?t
?t ?cMedvídek 2
Vratné lahve 2
NDBI040: Big Data Management and NoSQL Databases | Lecture 4: RDF Stores: SPARQL | 24. 10. 2017 52
Aggrega onAggregate func ons
COUNTCOUNT ((
DISTINCTDISTINCT
expressionexpression
**
))
SUMSUM
MINMIN
MAXMAX
AVGAVG
((
DISTINCTDISTINCT
expressionexpression ))
GROUP_CONCATGROUP_CONCAT ((
DISTINCTDISTINCT
expressionexpression
;; SEPARATORSEPARATOR == stringstring
))
NDBI040: Big Data Management and NoSQL Databases | Lecture 4: RDF Stores: SPARQL | 24. 10. 2017 53
Query FormsQuery forms
• SELECTFinds solu ons matching a provided graph pa ern
• ASKChecks whether at least one solu on exists
• DESCRIBERetrieves a graph with data about selected resources
• CONSTRUCTCreates a new graph according to a provided pa ern
NDBI040: Big Data Management and NoSQL Databases | Lecture 4: RDF Stores: SPARQL | 24. 10. 2017 54
Query FormsSELECT
SELECT query form
Finds solu ons matching a provided graph pa ern
prologue declarationsprologue declarations
SELECT clauseSELECT clause
FROM clauseFROM clause
WHERE clauseWHERE clause
solution modifierssolution modifiers
Result• Solu on sequence = ordered mul set of solu ons
NDBI040: Big Data Management and NoSQL Databases | Lecture 4: RDF Stores: SPARQL | 24. 10. 2017 55
Query FormsCONSTRUCT
CONSTRUCT query form
Creates a new graph according to a provided pa ern
prologue declarationsprologue declarations
CONSTRUCTCONSTRUCT {{ triples constructiontriples construction }}
FROM clauseFROM clause
WHERE clauseWHERE clause
solution modifierssolution modifiers
Result• RDF graph constructed according to a group graph pa ern
Unbound or invalid triples are not involved
NDBI040: Big Data Management and NoSQL Databases | Lecture 4: RDF Stores: SPARQL | 24. 10. 2017 56