This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Translating openCypher Queries to SQLWorking Draft
MÁRTON ELEKES, JÁNOS BENJAMIN ANTAL, JÓZSEF MARTON, and GÁBOR SZÁRNYASIn the last decade, a lot of database management systems were developed with different NoSQL techniques. One group ofthese systems are graph databases, which allow users to store and query their data as graphs. This data model is often abetter fit to represent strongly interlinked data sets than the traditional relational model, and its conciseness can lead tobetter performance. That said, relational databases have been developed and optimized for almost 50 years, and it is anopen question whether efficient processing of graph data requires specialized databases.
New query languages, such as openCypher, were developed for querying and processing graph data. These languagesusually offer a more intuitive way to express graph queries than SQL-like languages. However, most enterprises still store theirdata in traditional relational databases, which necessitates loading their data to graph databases. This is often impractical orinfeasible for production databases. Our goal is to allow using expressive graph query languages and leverage the performanceof existing relational databases while avoiding the overhead of transferring the data between different systems. To this end,we developed a transpiler which can transform openCypher graph queries into SQL.
Comparing the performance of database systems requires standard benchmarks. For relational databases, this is fulfilled bythe benchmarks of the Transaction Processing Performance Council. Due to the relative immaturity of graph databases, thereis only a limited number of benchmarks available for graph query workloads. We joined the development of the LDBC (LinkedData Benchmark Council) Social Network Benchmark. We reworked and significantly improved existing implementations ofthe Interactive Workload, then performed a thorough evaluation and detailed analysis of database systems.
ACM Reference Format:Márton Elekes, János Benjamin Antal, József Marton, and Gábor Szárnyas. 2019. Translating openCypher Queries to SQL:Working Draft. 1, 1 (March 2019), 11 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn
1 PRELIMINARIES1.1 Case studySocial network in Figure 1
MATCH (p:Person)OPTIONAL MATCH (p)-[:KNOWS]->(f:Person)RETURN p.name, count(f)
Listing 1. Cypher example code
Manuscript submitted to ACM
Translating openCypher Queries to SQL 3
id name agea Alice 24b Bob 53
(a) Persons
personIda
(b) Students
id topicc Neofolk
(c) Tags
id subjectd Folke Musicf Art
(d) Classes
tag classc d
(e) HasClass
src trgd ee f
(f) SubclassOf
src trg sincea b 2014
(g) Knows
person tag levela d 4
(h) Interests
personId langa enb deb en
(i) Speaks
Fig. 3. Relational model for the example
Object‐GraphMapping
Key Col2 Col3
0 A 24
1 B 53
Relation
:Osztály2attr1: Stringattr2: int
Objects:Class1
attr1: Stringattr2: int
:Class2attr1: Stringattr2: int
Graph
PREREQUISITE
:Coursename: 'Math2'ATTEND
:Studentname: 'Alice'
:Coursename: 'Math1'
Object‐RelationalMapping
Graph‐RelationalMapping
Fig. 4. Mapping between object-oriented, relational, and graph data models
edgeedge_id BIGINT
from BIGINT
to BIGINT
type TEXT
vertexvertex_id BIGINT
edge_propertyparent BIGINT
key TEXT
value JSONB
vertex_propertyparent BIGINT
key TEXT
value JSONB
labelparent BIGINT
name TEXT
Fig. 5. Relationa schema for representing property graphs. Key attributes are denoted with , foreign keys are denoted with doubleunderline, many-to-one relationships are denoted with
Example. The example query in SQL, based on the tables of Figure 3:
SELECT p.name, COUNT(f.id)FROM persons AS pLEFT JOIN knows ON p.id = knows.srcJOIN persons AS f ON knows.trg = f.id;
Listing 2. SQL example code
2 MAPPING GRAPH QUERIES TO RELATIONAL DATABASES2.1 Requirements for mappingThe Cypher language has some language constructs that are only supported by a few SQL dialects.
Manuscript submitted to ACM
4 Márton Elekes, János Benjamin Antal, József Marton, and Gábor Szárnyas
vertex_idabcdef
(a) vertex relation
parent namea Persona Studentb Personc Tagd Classe Classf Class
(b) label relation
parent key valuea name "Alice"a age 24a speaks ["en"]b name "Bob"b age 53b speaks ["en", "de"]
. . .
(c) vertex_property relation
edge_id from to type1 a b KNOWS2 a c INTEREST3 c d CLASS4 d e SUBCLASS_OF5 e f SUBCLASS_OF
(d) edge relation
parent key value1 since 20142 level 4
(e) edge_property relation
Fig. 6. Relations representing the example property graph
Fig. 7. Translating Cypher queries to SQL
Handling arrays. 1 2 3
Dynamic typing.
Recursive queries.
2.2 Mapping Cypher queries to SQL (C2S)2.2.1 Mapping workflow.
2.2.2 Get-vertices.Example. Collect all students’ names and ids
MATCH (p:Person:Student)RETURN p, p.name
p p.namea Alice
(◯Person,Studentp,p.name )
1 SELECT vertex_id AS "p",2 (SELECT "value" FROM vertex_property3 WHERE parent = vertex_id AND key = 'name') AS "p.name"4 FROM vertex5 WHERE NOT EXISTS(VALUES ('Person'), ('Student')6 EXCEPT ALL7 SELECT name FROM label WHERE parent = vertex_id)
Example. Collect students, their interests and interest levels
MATCH (s:Student)-[i:INTEREST]->(t)RETURN s, i, i.level, t, t.topic
s i i.level t t.topica 2 4 c Neofolk
⌊ ◯INTERESTÐÐÐÐÐÐ→
i,i.level◯
Students t,t.topic}
1 SELECT "from" AS "s", edge_id AS "i", "to" AS "t",2 (SELECT "value" FROM edge_property3 WHERE parent = edge_id AND key = 'level') AS "i.level",4 (SELECT "value" FROM vertex_property5 WHERE parent = "to" AND key = 'topic') AS "t.topic"6 FROM edge7 WHERE type IN ('INTEREST') AND8 NOT EXISTS(VALUES ('Student')9 EXCEPT ALL
10 SELECT name FROM label WHERE parent = "from")
] ◯T←→e◯
L1 L2v w { ≡ ] ◯
TÐ→e◯
L1 L2v w { ∪ ] ◯
TÐ→e◯
L2 L1w v {
2.2.4 Selection and Projection.
Example. Unique names of persons under 30 years old
MATCH (p:Person)WHERE p.age < 30RETURN DISTINCT p.name AS name
1 WITH q0 AS ( -- GetVertices2 SELECT vertex_id AS "p",3 (SELECT value FROM vertex_property4 WHERE parent = vertex_id AND key = 'age') AS "p.age",5 (SELECT value FROM vertex_property6 WHERE parent = vertex_id AND key = 'name') AS "p.name"7 FROM vertex8 WHERE NOT EXISTS(VALUES ('Person')9 EXCEPT ALL SELECT name FROM label WHERE parent = vertex_id)),
10 q1 AS ( -- Selection11 SELECT * FROM q0 WHERE "p.age" < 30),12 q2 AS ( -- Projection13 SELECT "p.name" AS "name" FROM q1)14 -- DuplicateElimination15 SELECT DISTINCT * FROM q2
2.2.5 Grouping and unwinding.
Manuscript submitted to ACM
6 Márton Elekes, János Benjamin Antal, József Marton, and Gábor Szárnyas
Example. Collect the languages and the name of their speakers
MATCH (p:Person)WITH p, p.name AS nameUNWIND p.speaks AS langRETURN lang, collect(name) AS speakers
1 WITH q0 AS ( /* GetVertices: (p:Person) | attributes: p.speaks, p.name */ ),2 q1 AS ( /* Projection: p, p.speaks, p.name AS name */ ),3 q2 AS ( -- Unwind4 SELECT "p", "name", unnest("p.speaks") AS "lang"5 FROM q1)6 -- Grouping7 SELECT "lang", array_agg("name") AS "speakers"8 FROM q29 GROUP BY "lang"
2.2.6 Natural join.
Example. Friends of friends of Alice
MATCH (p:Person {name: 'Alice'})<-[e1:KNOWS]->(f)<-[e2:KNOWS]->(foaf)RETURN foaf.name
1 WITH q0 AS ( -- GetEdges: (p:Person)-[e1:KNOWS]->(f) | attributes: p.name2 SELECT "from" AS "p", edge_id AS "e1", "to" AS "f",⋅⋅⋅ AS "p.name" FROM edge ⋅⋅⋅),3 q1 AS ( -- GetEdges: (p:Person)<-[e1:KNOWS]-(f) | attributes: p.name4 SELECT "from" AS "f", edge_id AS "e1", "to" AS "p",⋅⋅⋅ AS "p.name" FROM edge ⋅⋅⋅),5 q2 AS ( -- UnionAll: q0 ∪ q16 SELECT ⋅⋅⋅ FROM q0 UNION ALL SELECT ⋅⋅⋅ FROM q1),7 q3 AS ( -- Selection: p.name = 'Alice'8 SELECT * FROM q2 WHERE ("p.name" = 'Alice')),9 q4 AS ( -- GetEdges: (f)-[e2:KNOWS]->(foaf) | attributes: foaf.name
10 SELECT "from" AS "f", edge_id AS "e2", "to" AS "foaf",⋅⋅⋅ FROM edge ⋅⋅⋅),11 q5 AS ( -- GetEdges: (f)<-[e2:KNOWS]-(foaf) | attributes: foaf.name12 SELECT "from" AS "foaf", edge_id AS "e2", "to" AS "f",⋅⋅⋅ FROM edge ⋅⋅⋅),13 q6 AS ( -- UnionAll: q4 ∪ q514 SELECT ⋅⋅⋅ FROM q4 UNION ALL SELECT ⋅⋅⋅ FROM q5),15 q7 AS ( -- Join16 SELECT "left"."p", "left"."p.name", "left"."e1", "left"."f",17 "right"."e2", "right"."foaf", "right"."foaf.name"18 FROM q3 AS "left" INNER JOIN q6 AS "right" ON "left"."f" = "right"."f"),19 q8 AS ( -- AllDifferent20 SELECT * FROM q7 WHERE is_unique(ARRAY["e1", "e2"]))21 -- Projection22 SELECT "foaf.name" FROM q8
2.2.7 Antijoin.
Manuscript submitted to ACM
Translating openCypher Queries to SQL 7
Example. Categories without a superclass
MATCH (c:Class)WHERE NOT (c)-[:SUBCLASS_OF]->()RETURN c.subject
1 WITH q0 AS ( /* GetVertices: (c:Class) | attributes: c.subject */ ),2 q1 AS ( /* GetEdges: (c)-[e1:SUBCLASS_OF]->(v1) */ ),3 q2 AS ( -- AntiJoin4 SELECT * FROM q0 AS "left"5 WHERE NOT EXISTS(6 SELECT 1 FROM q1 AS "right"7 WHERE "left"."c" = "right"."c"))8 -- Projection9 SELECT "c.subject" FROM q2
2.2.8 Transitive join.
Example. Persons reachable from Bob in at most 6 steps
MATCH (p:Person {name: 'Bob'})<-[el:KNOWS*1..6]->(foaf)RETURN foaf.name
1 WITH q0 AS ( /* GetVertices: (p:Person) */ ),2 q1 AS ( -- GenerateId3 SELECT *, nextval('vertex_seq') AS "c" FROM q0),4 q2 AS ( -- InsertVertex5 INSERT INTO vertex SELECT "c" AS vertex_id FROM q1),6 q3 AS ( -- InsertLabels7 INSERT INTO label SELECT q1."c" AS parent, labels.l AS name8 FROM q1, (VALUES ('Person'), ('Student')) AS labels(l)),9 q4 AS ( -- InsertVertexProperty
10 INSERT INTO vertex_property11 SELECT "c" AS parent, 'name' AS key, 'Carol' AS value FROM q1),12 q5 AS ( -- GenerateId13 SELECT *, nextval('edge_seq') AS "k" FROM q1),14 q6 AS ( -- InsertEdge15 INSERT INTO edge16 SELECT "k" AS edge_id, "p" AS "from", "c" AS "to", 'KNOWS' AS type FROM q5),17 q7 AS ( -- InsertEdgeProperty18 INSERT INTO edge_property19 SELECT "k" AS parent, 'since' AS key, 2018 AS value FROM q5)20 SELECT "p", "k", "c" FROM q5
2.3 Mapping queries to a predefined schema2.3.1 Nullary operators on a given schema.
Manuscript submitted to ACM
Translating openCypher Queries to SQL 9
Fig. 8. Example use of the gTop schema description approach (based on [3])
Example. Nodes labelled Message and their content properties
1 WITH q0 AS ( -- GetVertices2 SELECT make_vertex_id('post', "id") AS "m",3 "content" AS "m.content"4 FROM "post"),5 q1 AS ( -- GetVertices6 SELECT make_vertex_id('comment', "id") AS "m",7 "content" AS "m.content"8 FROM "comment"),9 q2 AS ( -- UnionAll
10 SELECT "m", "m.content" FROM q011 UNION ALL12 SELECT "m", "m.content" FROM q1)13 -- Projection14 SELECT "m.content" FROM q2
Manuscript submitted to ACM
10 Márton Elekes, János Benjamin Antal, József Marton, and Gábor Szárnyas
1 WITH q0 AS ( -- GetEdges2 SELECT3 make_vertex_id('person', edgeTable."personId") AS "p",4 make_edge_id('LIKES', edgeTable."personId", edgeTable."postId") AS "l",5 make_vertex_id('post', edgeTable."postId") AS "m",6 toTable."content" AS "m.content",7 edgeTable."creationDate" AS "l.creationDate"8 FROM "person_LIKES_post" edgeTable9 JOIN "post" toTable ON (edgeTable."postId" = toTable."id")),
10 q1 AS ( /* GetEdges: (p:Person)-[l:LIKES]->(m:Comment)11 attributes: m.content, l.creationDate */ ),12 q2 AS ( /* UnionAll: q0 ∪ q1 */ )13 -- Projection14 SELECT "m.content", "l.creationDate" FROM q2
3 EVALUATIONFigure 9 and 10 show preliminary benchmark results.
● PostgreSQL Cypher−to−SQL transpiler on PostgreSQL
Fig. 10. Individual query execution times for the complex read queries of the LDBC SNB Interactive workload (hand-codedPostgreSQL queries vs. transpiled ones)
REFERENCES[1] Nadime Francis et al. 2018. Cypher: An Evolving Query Language for Property Graphs. In SIGMOD Conference. ACM,
1433–1445.[2] József Marton, Gábor Szárnyas, and Dániel Varró. 2017. Formalising openCypher Graph Queries in Relational Algebra. In
ADBIS (Lecture Notes in Computer Science), Vol. 10509. Springer, 182–196. https://doi.org/10.1007/978-3-319-66917-5_13[3] Benjamin A. Steer, Alhamza Alnaimi, Marco A. B. F. G. Lotz, Félix Cuadrado, Luis M. Vaquero, and Joan Varvenne. 2017.
Cytosm: Declarative Property Graph Queries Without Data Migration. In GRADES@SIGMOD. ACM, 4:1–4:6.[4] Gábor Szárnyas, József Marton, János Maginecz, and Dániel Varró. 2018. Reducing Property Graph Queries to Relational
Algebra for Incremental View Maintenance. CoRR abs/1806.07344 (2018). arXiv:1806.07344 http://arxiv.org/abs/1806.07344