Top Banner
BcnOnRails May - 2010 - On Graph Databases 1 On Graph Databases Pere Urbón Bayes [email protected] May of 2010
22

Bcn On Rails May2010 On Graph Databases

Jan 27, 2015

Download

Technology

Short introduction to graph databases at Bcn On Rails May 2010.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 1

On Graph Databases

Pere Urbón [email protected]

May of 2010

Page 2: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 2

On Graph Databases

● NoSQL movement.● Graph databases.● Pros and cons.● Use cases.● Technology overview.● Example.

Page 3: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 3

NoSQL Movement

● Next Generation of Databases.

● Innovative. (?)

● Open Source. (?)

● Non-Relational.

● Schema-less.

● Distributed.

● Scalable.

Page 4: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 4

NoSQL Movement

● Stores.

– Document.

– Key/Value.

– Object oriented.

– Column.

– Graph database.

● More Stores.

– Grid database.

– XML Database.

– RDF.

– .....

Page 5: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 5

NoSQL Movement

● NoSQL is not the holy grail, never forget it.

● Precursors & roots begun at the early 70's.

– Network databases, Charles Bachman 1969.

案ずるより産むが易し。– Giving birth to a baby is easier than worrying about it.

Page 6: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 6

Graph Databases

● Data strongly related.

– Social networks.

– GIS Systems.

– Transportation.

– Bibliographic.

– File systems.

– ........

GitHub Ruby community by country

Page 7: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 7

Graph Databases

● The Property Graph.

– Labeled.

– Directed.

– Attributed.

– Multigraph.● Talk about.

– Nodes with types.

– Edges with types.

– Attributes.

Page 8: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 8

Graph Databases

● Graph storage.

– Adjacency Matrix.

– Adjacency List.

– Incidence Matrix.

– Incidence List.● GraphDB's.

– Bitmaps.

– B+Trees.

– RB Trees.

Page 9: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 9

Graph Databases

RDBMS OIM DEX

data 27.36 GB 54 GB 9.69 GB

ratio overhead

10,9 21,51 3,86

load time 52891 s 17543 s 95579 s

Query MySQL OIM DEX

Q1:count 20,38 17,35 0

Q2:scan 32,76 174,64 3,14

Q3:select 7,34 5,43 0,84

Q4:projection 17,34 43,7 33,19

Q5:combine 0,74 2,61 0,01

Q6:explode 0,07 202,07 0,01

Q7:values 12,28 20,77 0,01

Q8:hub >3hours >3hours 624,68

Page 10: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 10

Graph Databases

Page 11: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 11

Use cases

● Network analysis.● Link analysis.● Graph mining.● Neural networks.● Bibliographic search.● Semantic web.

Page 12: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 12

Use cases

● Algorithmic recruitment with GitHub.– Centrality: The importance of a vertex within a

graph.● Betweens: Vertex that occur on many shortest

path have higher centrality. – O(v^3) without any optimization.

● Another possible choices:– Closeness: Vertex with a short geodesic distance

to other ones have a high closeness.● Usually preferred on network analysis.

Page 13: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 13

Graph Databases

● Shortest Paths.

– BFS/DFS.

– Dijkstra.

– Floyd-Warshall.

– Ford.● Connectivity.

– Strongly connected.

– Weakly connected.

● Centrality.

– Betweenness.

– Closeness.

– Diameter.

– Radius.● Traversals.

– BFS/DFS.● Communities.

● Staining.

Page 14: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 14

Pros and cons

● Data facts.

– Growths exponentially.

– Hugh interdependency and complexity.

– Relationships are important.

– Structure change over time.

● Relational model facts.

– E.F Codd model.

– Normalization.

– Object-Relational impedance mismatch.

– Join's doesn't scale.

– Big tables.

– Denormalization.

Page 15: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 15

Technology overview

● Neo4J: Open source database NoSQL graph. ● Dex: The high performance graph database.● HyperGraphDB: An IA and semantic web

graph database.● Infogrid: The Internet Graph database.● Sones: SaaS dot Net Graph database.● VertexDB: High performance database server.

Page 16: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 16

Benchmarking

Kernel DEX Neo4j Jena HypergraphDB

K1 Load (s) 7,44 697 141 +24h

K2 Scan edges (s) 0,0010 2,71 0,689

K3 2-hops (s) 0,0120 0,0260 0,443

K4 BC (s) 14,8 8,24 138

Db size (MB) 30 17 207

Scale 15

Kernel DEX Neo4j Jena HypergraphDB

K1 Load (s) 317 32.094 4.560 +24h

K2 Scan edges (s)

0,005 751 18,6

K3 2-hops (s) 0,033 0,0230 0,4580

K4 BC (s) 617 7.027 59.512

Db size (MB) 893 539 6.656

Scale 20

Graph Database Performance on theHPC Scalable Graph Analysis Benchmark

Page 17: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 17

Technology overview

Page 18: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 18

Technology overview

● Neo4J.rb ( JRuby target )

– Active record integration.

– Dynamic and schema free.

– Fast traversal of relationships.

– Transactions with rollbacks support.

– Indexing and querying of ruby objects.

– Massive loaders.

– Ruby on Rails integration.

– Accessible throw REST.

http://wiki.neo4j.org/content/Ruby

Page 19: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 19

Technology overview

require "rubygems" require 'neo4j'

Neo4j::Transaction.run do node = Neo4j::Node.new end

Neo4j::Transaction.run do # neo4j operations goes here end

node = Neo4j::Node.new node[:name] = 'foo' node[:age] = 123 node[:hungry] = false node[4] = 3.14 node[:age] # => 123

Creating nodes

Transactions over blocks

Properties

node1 = Neo4j::Node.newnode2 = Neo4j::Node.newNeo4j::Relationship.new(:friends, node1, node2)

# which is same asnode1.rels.outgoing(:friends) << node2

Creating relationships

Page 20: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 20

Technology overview

Accessing relationships

node1.rels.empty? # => false

# The rels method returns an enumeration of relationship objects. # The nodes method on the relationships returns the nodes instead. node1.rels.nodes.include?(node2) # => true

node1.rels.first # => the first relationship this node1 has. node1.rels.nodes.first # => node2 first node of any relationship type node2.rels.incoming(:friends).nodes.first # => node1 first node of relationship type 'friends' node2.rels.incoming(:friends).first # => a relationship object between node1 and node2

rel = node1.rels.outgoing(:friends).first

rel[:since] = 1982node1.rels.first[:since] # => 1982

Properties on Relationships

Page 21: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 21

Example

For the joy of someone, lets play a little with a graph database.

Page 22: Bcn On Rails May2010 On Graph Databases

BcnOnRails May - 2010 - On Graph Databases 22

On Graph Databases

Thanks you!

Pere Urbón Bayes

[email protected]