Top Banner
GRAPH DATABASE Ernesto Damiani and Paolo Ceravolo [email protected] Università degli Studi di Milano Dipartimento di Informatica
34

GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

Jun 21, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

GRAPH DATABASE

Ernesto Damiani and Paolo [email protected]

Università degli Studi di MilanoDipartimento di Informatica

Page 2: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

WHAT IS A GRAPH?

‣ Formally, a graph is just a collection of vertices and edges

‣ Graphs represent entities as nodes and the ways in which those entities relate as relationships

‣ This general-purpose, expressive structure allows us to model all kinds of scenarios

‣ Graphs are extremely useful in understanding a wide diversity of datasets in fields such as science, government, and business

‣ Represent networks: social structures, topological relationships

‣ Represent a sequence of events

‣ Represent relationships between concepts: hyperonymy, hyponymy, meronymy

Page 3: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

WHAT IS A GRAPH?

Page 4: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

WHAT IS A GRAPH?

Page 5: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

THE LABELED GRAPH MODEL

‣ The most popular form of graph model is the Labeled Graph Model

‣ It contains nodes and relationships

‣ Nodes contain properties (key-value pairs)

‣ Nodes can be labeled with one or more labels

‣ Relationships are named and directed, and always have a start and end node

‣ Relationships can also contain properties

Page 6: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

THE LABELED GRAPH MODEL

{date: 20

Page 7: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

GRAPH DATABASE MANAGEMENT SYSTEM

‣ A Graph Database Management System is an online database management system

‣ CRUD (Create, Read, Update, and Delete) properties

‣ OLTP (Online Transaction Processing) transactional systems

‣ OLAP (Online Analytical Processing)

‣ Management System that address scalability are also available

Page 8: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

GRAPH DATABASE MANAGEMENT SYSTEM

‣ There are two properties of graph databases we should consider when investigating graph database technologies:

‣ The underlying storage

‣ Some graph databases use native graph storage that is optimised and designed for storing and managing graphs

‣ The processing engine

‣ Native graph processing require that a graph database use index-free adjacency, meaning that connected nodes physically “point” to each other in the database

Page 9: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

GRAPH DATABASE MANAGEMENT SYSTEM

‣ Index-free adjacency

‣ A graph processing engine is said native if it implements index-free adjacency

‣ An index table implies O(log n) computational complexity while adjacent relationship O(1)

‣ The cost of queries is not dependent on the size of the graph but on the size of the traversed path

‣ With index-free adjacency, bidirectional joins are effectively precomputed and stored in the database as relationships

Page 10: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

GRAPH DATABASE MANAGEMENT SYSTEM

Page 11: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

GRAPH COMPUTE ENGINES

‣ A graph compute engine is a technology that enables global graph computational algorithms to be run against large datasets

‣ The architecture includes a system of record (SOR) database with OLTP properties

‣ Periodically, an Extract, Transform, and Load (ETL) job moves data from the system of record database into the graph compute engine for offline querying and analysis

Page 12: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

WHY USING GRAPH DATABASES

‣ Performances

‣ In contrast to relational databases, where join-intensive query performance deteriorates as the dataset gets bigger, with a graph database performance tends to remain relatively constant, even as the dataset grows. This is because queries are localized to a portion of the graph

‣ Flexibility

‣ Structure and schema can emerge with our growing understanding of the problem space

‣ Graphs are naturally additive, meaning we can add new kinds of relationships, new nodes, new labels, and new subgraphs to an existing structure without disturbing existing queries and application functionality

‣ Semantic lifting and expansion are naturally implemented on graphs

‣ Integration with heterogeneous sources is also more natural in graph databases

Page 13: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

WHY USING GRAPH DATABASES

‣ Agility

‣ Governance is typically applied in a programmatic fashion, using tests to drive out the data model and queries, as well as assert the business rules that depend upon the graph

Page 14: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

RELATIONAL DATABASES LACK RELATIONSHIPS

‣ Join tables add accidental complexity; they mix business data with foreign key metadata

‣ Foreign key constraints add additional development and maintenance overhead

‣ parse tables with nullable columns require special checking in code

‣ Several expensive joins are often needed

‣ Reciprocal queries are even more costly

Page 15: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

RELATIONAL DATABASES LACK RELATIONSHIPS

‣ Relational databases struggle with highly connected domains

‣ To understand the cost of performing connected queries in a relational database, we’ll look at some simple and not-so-simple queries in a social network domain

SELECT p1.PersonFROM Person p1 JOIN PersonFriend

ON PersonFriend.FriendID = p1.ID JOIN Person p2

ON PersonFriend.PersonID = p2.ID

WHERE p2.Person = 'Bob'

Page 16: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

RELATIONAL DATABASES LACK RELATIONSHIPS

‣ Relational databases struggle with highly connected domains

‣ To understand the cost of performing connected queries in a relational database, we’ll look at some simple and not-so-simple queries in a social network domain

SELECT p1.PersonFROM Person p1 JOIN PersonFriend

ON PersonFriend.PersonID = p1.ID JOIN Person p2

ON PersonFriend.FriendD = p2.ID

WHERE p2.Person = 'Bob'

Page 17: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

NOSQL DATABASES ALSO LACK RELATIONSHIPS ‣ Seeing a reference to order: 1234 in the

record beginning user: Alice, we infer a connection between user: Alice and order: 1234. This gives us false hope that we can use keys and values to manage graphs

‣ There are no identifiers that “point” backward (the foreign aggregate “links” are not reflexive, of course), we lose the ability to run other interesting queries on the database

‣ Aggregate stores do not maintain consistency of connected data, nor do they support what is known as index- free adjacency

‣ Aggregate stores must employ inherently latent methods for creating and querying relationships outside the data model

Page 18: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

PERFORMANCE

‣ Graph Databases are designed to traverse graphs, their performances in querying interconnected domains are high

Page 19: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

PERFORMANCE

‣ Graph Databases are designed to traverse graphs, their performances in querying interconnected domains are high

Page 20: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

QUERYING GRAPHS‣ Cypher is an expressive (yet compact) graph database query language

‣ Other graph databases have other means of querying data. Many, including Neo4j, support the RDF query language SPARQL and the imperative, path-based query language Gremlin

(emil)<-[:KNOWS]-(jim)-[:KNOWS]->(ian)-[:KNOWS]->(emil)

Page 21: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

QUERYING GRAPHS

 (emil:Person {name:'Emil'})      <-[:KNOWS]-(jim:Person {name:'Jim'})      -[:KNOWS]->(ian:Person {name:'Ian'})      -[:KNOWS]->(emil)

Page 22: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

QUERYING GRAPHS

MATCH (a:Person {name:'Jim'})-[:KNOWS]->(b)-[:KNOWS]->(c), (a)-[:KNOWS]->(c)RETURN b, c

Page 23: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

QUERYING GRAPHS

MATCH (a:Person)-[:KNOWS]->(b)-[:KNOWS]->(c), (a)-[:KNOWS]->(c) WHERE a.name = 'Jim'RETURN b, c

Page 24: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

QUERYING GRAPHS‣ Cypher Clauses

‣ WHERE: Provides criteria for filtering pattern matching results.

‣ CREATE and CREATE UNIQUE: Create nodes and relationships.

‣ MERGE: Ensures that the supplied pattern exists in the graph, either by reusing existing nodes and relationships that match the supplied predicates, or by creating new nodes and relationships.

‣ DELETE: Removes nodes, relationships, and properties.

‣ SET: Sets property values.

‣ FOREACH: Performs an updating action for each element in a list.

‣ UNION: Merges results from two or more queries.

‣ WITH: Chains subsequent query parts and forwards results from one to the next. Similar to piping commands in Unix.

Page 25: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

INCREMENTAL MODELING‣ Graph databases provide for the smooth evolution of a data model

‣ We develop the data model feature by feature, user story by user story

Page 26: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

INCREMENTAL MODELING

Page 27: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

INCREMENTAL MODELING

Page 28: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

INCREMENTAL MODELING

Page 29: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

INCREMENTAL MODELING

Page 30: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

INCREMENTAL MODELING‣ If we need to find all the events

that have occurred over a specific period, we can build a timeline tree

Page 31: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

INCREMENTAL MODELING‣ The carousel fraud

Page 32: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

QUERYING GRAPHS‣ POLE MODEL

‣ The POLE data model focuses on four basic types of entities and the relationships between them: Persons, Objects, Locations, and Events

Greater Manchester, UK from August 2017

Page 33: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

INTEGRATION WITH ONTOLOGIES ‣ An ontology is a formal, explicit specification of a shared

conceptualization that is characterized by high semantic expressiveness required for increased complexity ( Feilmayr and Wöß - 2016)

‣ Ontology are typically represented as graphs

‣ Web Ontology Language (OWL) is typically represented using RDF triples

‣ Ontologies contain inference rules that can be applied to a knowledge base

Page 34: GRAPH DATABASE - unimi.it‣ Cypher is an expressive (yet compact) graph database query language ‣ Other graph databases have other means of querying data. Many, including Neo4j,

INTEGRATION WITH ONTOLOGIES ‣ Taking an example for the  LUBM benchmark (Lehigh University Benchmark), a

student is derived to be an attendee if he or she takes some course

‣ Thus when she matches the following ontological rule: Student and (takesCourse some) SubClassOf Attendee

‣Any experienced Neo4j programmer may rub his or her hands since this rule can be translated straightforward into the following Cypher expression:

match (x:Student)-[:takesCourse]->() set x:Attendee

‣ That is perfectly possible but could become cumbersome in case of deeply nested rules that may also depend on each other

‣ For instance, the Cypher expression misses the subclasses of Student such as UndergraduateStudent. Strictly speaking the expression above should therefore read: match (x)-[:takesCourse]->() where x:Student or x:UndergraduateStudent set x:Attendee