KIT – University of the State of Baden-Württemberg and National Large-scale Research Center of the Helmholtz Association Institute of Applied Informatics and Formal Description Methods (AIFB) www.kit.edu CumulusRDF Linked Data Management on Nested Key-Value Stores Günter Ladwig , Andreas Harth Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS2011)
23
Embed
CumulusRDF Linked Data Management on Nested Key-Value Stores
CumulusRDF Linked Data Management on Nested Key-Value Stores. Günter Ladwig , Andreas Harth Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS2011). Contents. Introduction Linked Data Apache Cassandra CumulusRDF Storage Layouts Storage Model Hierarchical Layout Flat Layout - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
KIT – University of the State of Baden-Württemberg andNational Large-scale Research Center of the Helmholtz Association
Institute of Applied Informatics and Formal Description Methods (AIFB)
www.kit.edu
CumulusRDFLinked Data Management on Nested Key-Value StoresGünter Ladwig, Andreas HarthWorkshop on Scalable Semantic Web Knowledge Base Systems (SSWS2011)
Institute of Applied Informatics and Formal Description Methods (AIFB)2 October 24th, 2011
Institute of Applied Informatics and Formal Description Methods (AIFB)6 October 24th, 2011
Triple Patterns
A triple pattern is an RDF triple that may contain variables instead of RDF terms in any position?s dbpprop:birthPlace dbpedia:Karlsruhe .or?s foaf:name ?o .
Linked Data Lookup on t translates into two triple patterns lookups
(t ? ?)(? ? t)
At least three indexes to cover all possibletriple patterns (with prefix lookups)
SSWS 2011, Bonn
Patterns Index? ? ? Anys ? ? SPO? p ? POS? ? o OSPs p ? SPO? p o POSs ? o OSPs p o Any
Institute of Applied Informatics and Formal Description Methods (AIFB)7 October 24th, 2011
Apache Cassandra
Open source data management systemDistributed key-value store (DHT-based)
Nested key-value data modelSchema-less
DecentralizedEvery node in the cluster has the same roleNo single point of failure
ElasticThroughput increases linearly as machines are added with no downtime
Fault-tolerantData can be replicated
SSWS 2011, Bonn
Institute of Applied Informatics and Formal Description Methods (AIFB)8 October 24th, 2011
CumulusRDF
SSWS 2011, Bonn
Institute of Applied Informatics and Formal Description Methods (AIFB)9 October 24th, 2011
CumulusRDF Functionality
Distributed deployment to enable scale (more data and also more clients) by adding more machines (via Cassandra)
Geographical replication (via Cassandra)
Write-optimized indices with eventual consistency (via Cassandra)
Triple pattern lookups (via CumulusRDF index structures)
Linked Data Lookups (via CumulusRDF index structures)
SSWS 2011, Bonn
Institute of Applied Informatics and Formal Description Methods (AIFB)10 October 24th, 2011
STORAGE LAYOUTS
SSWS 2011, Bonn
Institute of Applied Informatics and Formal Description Methods (AIFB)11 October 24th, 2011 SSWS 2011, Bonn
Nested Key-Value Storage Model
Column-only { row-key : { column : value } }
Super columns { row-key : { supercolumn : { column : value } } }
ro
c00 v00
c01 v01
... ...
r1
c10 v10
c11 v11
... ...
...
Row
Columns
Column key Column value
r2
sc00
sc01
c000 v000
c010 v010
... ...
r3
sc00
sc01
c000 v000
c010 v010
... ...
Super column key
Institute of Applied Informatics and Formal Description Methods (AIFB)12 October 24th, 2011
Nested Key-Value Storage Model
Secondary indexes map column values to rows{ value : row-key }
Cassandra limitationsEntire rows always stored on a single nodeNo range queries on row keysColumns are stored in specified order and allow for range queries
SSWS 2011, Bonn
Institute of Applied Informatics and Formal Description Methods (AIFB)13 October 24th, 2011
Hierarchical Layout
Uses super columnsRDF terms occupy row, supercolumn and column positions
Value is emptyThree indexes SPO, POS, OSP cover all possible triple patternExample: SPO index
SPO: { s : { p : { o : - } } }
SSWS 2011, Bonn
dbp:Jaws
foaf:name
rdf:type
“Jaws” -
dbp:Film -dbp:Work -
Row key Super column key Column key Value
Institute of Applied Informatics and Formal Description Methods (AIFB)14 October 24th, 2011
Flat Layout
Uses columns onlyRange queries on column keys allow prefix lookups
Concatenate second & third position to form column keySPO { s : { po : - } }po is the concatenation of predicate and objectFor (sp?) we perform a prefix lookup on p in row with key s
1M sampled S, SP, SPO, SO, and O patterns from datasetOutput: all matching triples
Linked Data lookup queries2M resource lookups from DBpedia logs (1.2M unique)Output: all triples with URI as subject and 10k triples with URI as object
Evaluation
SSWS 2011, Bonn
10k all
C0
Clients
CumulusRDF
C1 C2 C3
C0-C3:Cassandranodes
Institute of Applied Informatics and Formal Description Methods (AIFB)18 October 24th, 2011