Top Banner
Graph Databases and the Future of Large-Scale Knowledge Management Marko A. Rodriguez T-5, Center for Nonlinear Studies Los Alamos National Laboratory http://markorodriguez.com April 8, 2009
38

Graph Databases and the Future of Large-Scale Knowledge Management

May 11, 2015

Download

Technology

Marko Rodriguez

Modern day open source and commercial graph databases can store on the order of 1 billion relationships with some databases reaching the 10 billion mark. These developments are making the graph database practical for applications that require large-scale knowledge structures. Moreover, with the Web of Data standards set forth by the Linked Data community, it is possible to interlink graph databases across the web into a giant global knowledge structure. This talk will discuss graph databases, their underlying data model, their querying mechanisms, and the benefits of the graph data structure for modeling and analysis.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Graph Databases and the Future of Large-Scale Knowledge Management

Graph Databases and the Future of Large-Scale

Knowledge Management

Marko A. RodriguezT-5, Center for Nonlinear StudiesLos Alamos National Laboratory

http://markorodriguez.com

April 8, 2009

Page 2: Graph Databases and the Future of Large-Scale Knowledge Management

Abstract

Modern day open source and commercial graph databases can store on theorder of 1 billion relationships with some databases reaching the 10 billionmark. These developments are making the graph database practical forapplications that require large-scale knowledge structures. Moreover, withthe Web of Data standards set forth by the Linked Data community, it ispossible to interlink graph databases across the web into a giant globalknowledge structure. This talk will discuss graph databases, theirunderlying data model, their querying mechanisms, and the benefits of thegraph data structure for modeling and analysis.

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 3: Graph Databases and the Future of Large-Scale Knowledge Management

Outline

• The Relational Database vs. the Graph Database

• The Web of Documents vs. the Web of Data

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 4: Graph Databases and the Future of Large-Scale Knowledge Management

Outline

• The Relational Database vs. the Graph Database

• The Web of Documents vs. the Web of Data

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 5: Graph Databases and the Future of Large-Scale Knowledge Management

The Relational Database vs. the Graph Database

• A relational database’s (e.g. MySQL, PostgreSQL, Oracle) data modelis a collection interlinked tables.

• A graph database’s (e.g. OpenSesame, AllegroGraph, Neo4j) data modelis a multi-relational graph.

Graph Database

127.0.0.2

Relational Database

127.0.0.1

aa

b

c

d

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 6: Graph Databases and the Future of Large-Scale Knowledge Management

Types of Graphs

• Undirected single-relational graph: homogenous set of symmetric links.

• Directed single-relational graph: homogenous set of links.

• Directed multi-relational graph: heterogenous set of links.

x z

x z

x zy

undirected single-relational graph

directed single-relational graph

directed multi-relational graph

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 7: Graph Databases and the Future of Large-Scale Knowledge Management

Our Make Believe World - Phase 1

• Marko is a human and Fluffy is a dog.

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 8: Graph Databases and the Future of Large-Scale Knowledge Management

Our World Modeled in a Relational Database - Phase 1

0001

ID Name Legs Fur

Marko 2 false

0002 Fluffy 4 true

Object_Table

Type

Dog

Human

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 9: Graph Databases and the Future of Large-Scale Knowledge Management

Our World Modeled in a Graph Database - Phase 1

0001 0002

Marko Fluffy

Human Dog

2 4 truefalse

name

type

name

type

furlegs legs fur

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 10: Graph Databases and the Future of Large-Scale Knowledge Management

Our Make Believe World - Phase 2

• Marko is a human and Fluffy is a dog.

• Marko and Fluffy are good friends.

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 11: Graph Databases and the Future of Large-Scale Knowledge Management

Our World Modeled in a Relational Database - Phase 2

0001

ID Name Legs Fur

Marko 2 false

0002 Fluffy 4 true

0001

ID2 ID2

0002

Object_Table Friendship_Table

0002

0001

Type

Dog

Human

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 12: Graph Databases and the Future of Large-Scale Knowledge Management

Our World Modeled in a Graph Database - Phase 2

0001 0002

Marko Fluffy

Human Dog

2 4 truefalse

name

type

name

type

furlegs legs fur

friendfriend

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 13: Graph Databases and the Future of Large-Scale Knowledge Management

Our Make Believe World - Phase 3

• Marko is a human and Fluffy is a dog.

• Marko and Fluffy are good friends.

• Human and dog are a subclass of mammal.

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 14: Graph Databases and the Future of Large-Scale Knowledge Management

Our World Modeled in a Relational Database - Phase 3

0001

ID Name Legs Fur

Marko 2 false

0002 Fluffy 4 true

0001

ID2 ID2

0002

Object_Table Friendship_Table

0002

0001

Type

Dog

Human Human

Type1 Type2

Dog

Mammal

Mammal

Subclass_Table

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 15: Graph Databases and the Future of Large-Scale Knowledge Management

Our World Modeled in a Graph Database - Phase 3

0001 0002

Marko Fluffy

Human Dog

2 4 truefalse

name

type

name

type

furlegs legs fur

Mammal

subclassof subclassof

friendfriend

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 16: Graph Databases and the Future of Large-Scale Knowledge Management

Our Make Believe World - Phase 4

• Marko is a human and Fluffy is a dog.

• Marko and Fluffy are good friends.

• Human and dog are a subclass of mammal.

• Fluffy peed on the carpet.

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 17: Graph Databases and the Future of Large-Scale Knowledge Management

Our World Modeled in a Relational Database - Phase 4

0001

ID Name Legs Fur

Marko 2 false

0002 Fluffy 4 true

0001

ID2 ID2

0002

Object_Table

Friendship_Table

0002

0001

Type

Dog

Human

0003 My_Rug Carpet N/A N/A

Human

Type1 Type2

Dog

Mammal

Mammal

Subclass_Table

0002

ID1 ID2

0003

Pee_Table

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 18: Graph Databases and the Future of Large-Scale Knowledge Management

Our World Modeled in a Graph Database - Phase 4

0001 0002

Marko Fluffy

Human Dog

2 4 truefalse

name

type

name

type

furlegs legs fur

Mammal

subclassof subclassof

peedOn 0003

Carpet

type

My_Rug

name

friendfriend

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 19: Graph Databases and the Future of Large-Scale Knowledge Management

Our Make Believe World - Phase 5

• Marko is a human and Fluffy is a dog.

• Marko and Fluffy are good friends.

• Human and dog are a subclass of mammal.

• Fluffy peed on the carpet.

• Marko and Fluffy are both mammals.

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 20: Graph Databases and the Future of Large-Scale Knowledge Management

Our World Modeled in a Relational Database - Phase 5

0001

ID Name Legs Fur

Marko 2 false

0002 Fluffy 4 true

0001

ID2 ID2

0002

Object_Table

Friendship_Table

0002

0001

Type

Dog

Human

0003 My_Rug Carpet N/A N/A

Human

Type1 Type2

Dog

Mammal

Mammal

Subclass_Table

0002

ID1 ID2

0003

Pee_Table

0001

ID Type

0002

Human

Dog

Type_Table

0003

0001

0002

Carpet

Mammal

Mammal

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 21: Graph Databases and the Future of Large-Scale Knowledge Management

Our World Modeled in a Graph Database - Phase 5

0001 0002

Marko Fluffy

Human Dog

2 4 truefalse

name

type

name

type

furlegs legs fur

Mammal

subclassof subclassof

peedOn 0003

Carpet

type

My_Rug

name

type type

friendfriend

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 22: Graph Databases and the Future of Large-Scale Knowledge Management

The Graph as the Natural World Model

• The world is inherently (or perceived as) object-oriented.

• The world is filled with objects and relations among them.

• The multi-relational graph is a very natural representation of the world.

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 23: Graph Databases and the Future of Large-Scale Knowledge Management

The Graph as the Natural Programming Model

• High-level computer languages are object-oriented.

• Nearly no impedance mismatch between the multi-relational graph andthe programming object.

• It is easy to go from graph database to in-memory object.

Human marko = new Human();marko.name = "Marko";marko.addFriend(fluffy);marko.setHasFur(false);marko.setLegs(2);

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 24: Graph Databases and the Future of Large-Scale Knowledge Management

SQL vs. SPARQL

SELECT OTY.Name FROM Object_Table AS OTX,Object_Table AS OTY, Friendship_Table WHERE

OTX.Name = "Marko" ANDFriendship_Table.ID1 = OTY.ID ANDFriendship_Table.ID2 = OTX.ID;

SELECT ?z WHERE {?x name "Marko" .?y friend ?x .?y name ?z }

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 25: Graph Databases and the Future of Large-Scale Knowledge Management

Outline

• The Relational Database vs. the Graph Database

• The Web of Documents vs. the Web of Data

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 26: Graph Databases and the Future of Large-Scale Knowledge Management

Internet Address Spaces

• The Uniform Resource Identifier (URI) is the superclass of the UniformResource Locator (URL) and Uniform Resource Name (URN).

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 27: Graph Databases and the Future of Large-Scale Knowledge Management

The Uniform Resource Locator

• The set of all URLs is the address space of all resources that can belocated and retrieved on the Web. URLs denote where a resource is.

? http://markorodriguez.com/index.html∗ Domain name server (DNS): markorodriguez.com→ 216.251.43.6∗ http:// means GET at port 80,∗ /index.html means the resource to get at that Internet location.

markorodriguez.com216.251.43.6

Web Server

index.html

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 28: Graph Databases and the Future of Large-Scale Knowledge Management

The Uniform Resource Name

• The set of all URNs is the address space of all resources within the urn:namespace.

? urn:uuid:bd93def0-8026-11dd-842be54955baa12? urn:issn:0892-3310? urn:doi:10.1016/j.knosys.2008.03.030

• Named resources need not be retrievable through the Web.

• URNs denote what a resource is.

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 29: Graph Databases and the Future of Large-Scale Knowledge Management

The Uniform Resource Identifier

• The URI address space is an infinite space for all Internet resources.

? http://markorodriguez.com/index.html? urn:issn:0892-3310? ftp://markorodriguez.com/private/markos_secrets.txt? http://www.lanl.gov#fluffy

• Imporant: URIs can denote concepts, instances, and datum.

lanl:fluffy lanl:fluffy_legs

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 30: Graph Databases and the Future of Large-Scale Knowledge Management

The Web of Documents

• The World of Documents is primarily concerned with the Hyper-TextTransfer Protocol (HTTP) and with retrievable resources in the URLaddress space.

• These retrievable resources are files: HTML documents, images, audio,etc. The “web” is created when HTML documents contain URLs.

index.html

Home.html Research.htmlResume.html hrefhref

href

http://markorodriguez.com/

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 31: Graph Databases and the Future of Large-Scale Knowledge Management

The Web of Data

• The Web of Data is primarily concerned with URIs.

• The Resource Description Framework (RDF) is the standard forrepresenting the relationship between URIs and literals (e.g. float, string,date time, etc.).

lanl:marko lanl:fluffyfoaf:knows

foaf:name

"Marko A. Rodriguez"^^xsd:string

foaf:name

"Fluffy P. Everywhere"^^xsd:string

subject objectpredicate

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 32: Graph Databases and the Future of Large-Scale Knowledge Management

Our Make Believe World in RDF

lanl:marko lanl:fluffy

foaf:name

"Marko A. Rodriguez"^^xsd:string

foaf:name

"Fluffy P. Everywhere"^^xsd:string

lanl:Dog

rdf:typerdf:type

lanl:Human

lanl:Mammal

rdfs:subClassOf rdfs:subClassOf

"2"^^xsd:integer "4"^^xsd:integer

lanl:legs lanl:legs

"false"^^xsd:boolean

lanl:fur

"true"^^xsd:boolean

lanl:fur

lanl:friend

lanl:friend

rdf:type rdf:type

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 33: Graph Databases and the Future of Large-Scale Knowledge Management

The Web of Data is a Distributed Database

• The URI address space is distributed.

• URIs can denote datum.

• RDF denotes the relationships URIs.

• The Web of Data’s foundational standard is RDF.

• Therefore, the Web of Data is a distributed database.

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 34: Graph Databases and the Future of Large-Scale Knowledge Management

The Web of Documents vs. the Web of Data

Web Server

127.0.0.1

HTML

Web Server

127.0.0.2

HTMLhref

Graph Database

127.0.0.1

Graph Database

127.0.0.2

lanl:friend

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 35: Graph Databases and the Future of Large-Scale Knowledge Management

The Current Web of Data - March 2009

geospecies

freebase

dbpedia

libris

geneid

interpro

hgnc

symbol

pubmed

mgi

geneontology

uniprot

pubchem

unists

omim

homologene

pfam

pdb

reactome

chebi

uniparc

kegg

cas

uniref

prodomprosite

taxonomy

dailymed

linkedct

acm

dblprkbexplorer

laascnrs

newcastle

eprints

ecssouthampton

irittoulouseciteseer

pisa

resexibm

ieee

rae2001

budapestbme

eurecom

dblphannover

diseasome

drugbank

geonames

yago

opencyc

w3cwordnet

umbel

linkedmdb

rdfbookmashup

flickrwrappr

surgeradio

musicbrainz myspacewrapper

bbcplaycountdata

bbcprogrammes

semanticweborg

revyu

swconferencecorpus

lingvoj

pubguide

crunchbase

foafprofiles

riese

qdos

audioscrobbler

flickrexporter

bbcjohnpeel

wikicompany

govtrack

uscensusdata

openguides

doapspace

bbclatertotp

eurostat

semwebcentral

dblpberlin

siocsites

jamendo

magnatuneworldfactbook

projectgutenberg

opencalais

rdfohloh

virtuososponger

geospecies

freebase

dbpedia

libris

geneid

interpro

hgnc

symbol

pubmed

mgi

geneontology

uniprot

pubchem

unists

omim

homologene

pfam

pdb

reactome

chebi

uniparc

kegg

cas

uniref

prodomprosite

taxonomy

dailymed

linkedct

acm

dblprkbexplorer

laascnrs

newcastle

eprints

ecssouthampton

irittoulouseciteseer

pisa

resexibm

ieee

rae2001

budapestbme

eurecom

dblphannover

diseasome

drugbank

geonames

yago

opencyc

w3cwordnet

umbel

linkedmdb

rdfbookmashup

flickrwrappr

surgeradio

musicbrainz myspacewrapper

bbcplaycountdata

bbcprogrammes

semanticweborg

revyu

swconferencecorpus

lingvoj

pubguide

crunchbase

foafprofiles

riese

qdos

audioscrobbler

flickrexporter

bbcjohnpeel

wikicompany

govtrack

uscensusdata

openguides

doapspace

bbclatertotp

eurostat

semwebcentral

dblpberlin

siocsites

jamendo

magnatuneworldfactbook

projectgutenberg

opencalais

rdfohloh

virtuososponger

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 36: Graph Databases and the Future of Large-Scale Knowledge Management

The Current Web of Data - March 2009data set domain data set domain data set domain

audioscrobbler music govtrack government pubguide booksbbclatertotp music homologene biology qdos socialbbcplaycountdata music ibm computer rae2001 computerbbcprogrammes media ieee computer rdfbookmashup booksbudapestbme computer interpro biology rdfohloh socialchebi biology jamendo music resex computercrunchbase business laascnrs computer riese governmentdailymed medical libris books semanticweborg computerdblpberlin computer lingvoj reference semwebcentral socialdblphannover computer linkedct medical siocsites socialdblprkbexplorer computer linkedmdb movie surgeradio musicdbpedia general magnatune music swconferencecorpus computerdoapspace social musicbrainz music taxonomy referencedrugbank medical myspacewrapper social umbel generaleurecom computer opencalais reference uniref biologyeurostat government opencyc general unists biologyflickrexporter images openguides reference uscensusdata governmentflickrwrappr images pdb biology virtuososponger referencefoafprofiles social pfam biology w3cwordnet referencefreebase general pisa computer wikicompany businessgeneid biology prodom biology worldfactbook governmentgeneontology biology projectgutenberg books yago generalgeonames geographic prosite biology . . .

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 37: Graph Databases and the Future of Large-Scale Knowledge Management

Cultural Differences that are Leading to a New World ofLarge-Scale Knowledge Management

• Relational databases tend to not maintain public access points.

• Relational database users tend to not publish their schemas.

• Web of Data graph databases maintain public access points calledSPARQL end-points.

• Web of Data graph databases tend to reuse and extend public schemascalled ontologies.

Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Page 38: Graph Databases and the Future of Large-Scale Knowledge Management

Conclusion

• Thank you for your time...

? My homepage: http://markorodriguez.com? Neno/Fhat: http://neno.lanl.gov? Collective Decision Making Systems: http://cdms.lanl.gov? Faith in the Algorithm: http://faithinthealgorithm.net? MESUR: http://www.mesur.org

Risk Symposium – Santa Fe, New Mexico – April 8, 2009